CCDM2016数据挖掘会议 聚类分析研讨会

聚类分析研讨会

概述

    聚类分析是数据挖掘、机器学习、模式识别等研究领域的重要工具之一,也是这些领域近十多年来研究的热点。但聚类分析是一个病态问题,即使关于“类”统一且普遍接受的定义尚不存在,这一方面致使聚类分析的理论基础仍很薄弱,另一方面导致文献中的聚类算法众多,而众多的聚类算法又使用户处于尴尬的境地:难以选择适合自己聚类问题的算法。此外,社会生活中越来越普遍的大数据又给聚类分析带来新的挑战,即如何进行快速有效的聚类分析。因此,我们建议选择“聚类分析”作为CCDM2016的论坛之一,以推动国内学者在聚类分析方向的研究。

论坛主席 ???

于剑,现任北京交通大学教授,博士生导师,计算机学院计算机科学系主任,交通数据分析与挖掘北京市重点实验室主任,中国计算机学会理事,中国计算机学会人工智能与模式识别专业委员会秘书长,中国人工智能学会理事,中国人工智能学会机器学习专业委员会副主任,国家数字印刷重点实验室学术委员。主持多项国家自然科学基金项目。主要研究兴趣是机器学习和数据挖掘等。



讲者 ???

Prof. Miin-Shen Yang (楊敏生) received the BS degree in mathematics from the Chung Yuan Christian University, Chung-Li, Taiwan, in 1977, the MS degree in applied mathematics from the National Chiao-Tung University, Hsinchu, Taiwan, in 1980, and the PhD degree in statistics from the University of South Carolina, Columbia, USA, in 1989.In 1989, he joined the faculty of the Department of Mathematics in the Chung Yuan Christian University (CYCU) as an Associate Professor, where, since 1994, he has been a Professor. From 1997 to 1998, he was a Visiting Professor with the Department of Industrial Engineering, University of Washington, Seattle, USA. During 2001-2005, he was the Chairman of the Department of Applied Mathematics in CYCU. Since 2012, he has been a Distinguished Professor of the Department of Applied Mathematics and the Director of Chaplain’s Office in CYCU. His research interests include fuzzy clustering, applications of statistics, neural fuzzy systems, pattern recognition, and machine learning. Dr. Yang was an Associate Editor of the IEEE Transactions on Fuzzy Systems (2005-2011), and is an Associate Editor of the Applied Computational Intelligence & Soft Computing and Editor-in-Chief of Advances in Computational Research. He was awarded with 2008 Outstanding Associate Editor of IEEE Transactions on Fuzzy Systems, IEEE; 2009 Outstanding Research Professor of Chung Yuan Christian University; 2010 Top Cited Article Award 2005-2010, Pattern Recognition Letters; 2012-2018 Distinguished Professorship of Chung Yuan Christian University.

???

邓赵红,博士, 江南大学数字媒体学院教授, IEEE Senior Member, 中国计算机学会高级会员。研究方向为模糊逻辑、神经计算及其在微生物过程建模和智能健康等方面的应用研究。在应用基础研究方面,作为负责人主持了1项江苏省杰出青年基金项目、1项教育部新世纪优秀人才支持计划项目、2项国家自然科学基金项目和1项江苏省自然科学基金面上项目。在学术论文方面,在计算智能领域国内外主流期刊发表论文60余篇,包含国际权威期刊IEEE/ACM Trans系列常文18篇。作为主要完成人获得教育部科技进步一等奖1次。在合作交流方面,自2004年至2012年曾多次到香港从事合作研究,并于2013年3月至2014年3月在加州大学戴维斯分校做了为期一年的访问研究。近年来在科研教学方面获得了江苏省杰出青年基金获得者、教育部新世纪优秀人才支持计划获得者和江南大学至善青年学者等荣誉。目前为Neurocomputing, PLOS One 等四个国际期刊的编委。

???

马占宇,博士,北京邮电大学副教授,丹麦奥尔堡大学兼职副教授,北邮信通院-邦赢 彩票大数据联合实验室主任,曾就读于北京邮电大学和瑞典皇家理工学院。主要研究领域为非高斯概率模型及其在多媒体信号处理、生物医学信号处理和生物信息学等领域的应用。主持国家自然科学基金项目两项、北京市自然科学基金项目一项、教育部留学归国人员科研启动基金一项,瑞典博士后研究基金一项;参与欧盟合作研究项目两项,瑞典ÅF 基金会研究项目一项,瑞典科研与教育国际合作基金会项目一项。共发表包括IEEE trans. on PAMI.在内的学术论文40余篇。

日程安排

时间:2016年5月20日下午


13:00-13:05 论坛致辞
于剑教授
13:05-14:05 Applications of fuzzy clustering in regression models
楊敏生教授
Abstract-In this invited talk, there are three parts to be presented. In Part I, I talk about fuzzy cluster-wise regressions. In part II, step-wise possibilistic c-regressions are considered. In part III, I present the newly proposed change-point regression models using fuzzy clustering.
  Regression analysis is used in evaluating the functional relationship between the dependent and independent variables. Cluster-wise (or called switching) regression analysis is to embed clustering techniques into regression models. Since Zadeh (1965) proposed fuzzy sets that produced the idea of partial memberships described by membership functions, fuzzy set theory has been widely applied in clustering. For fuzzy clustering, the fuzzy c-means (FCM) algorithm, first proposed by Dunn (1974) and then extended by Bezdek (1981), is the most commonly used method. Since Quandt (1958 and 1960) and Chow (1960) initiated researches on switching regressions, it had been widely studied and applied. Hathaway and Bezdek (1993) first combined switching regressions with FCM by embedding fuzzy c-partitions to regression models, and referred to them as fuzzy c-regressions (FCRs). However, the FCRs are sensitive to noise and outliers. In part I, I present the work of Yang et al. (2008) that applied the concept of the -cut implemented clustering algorithms (FCM) to FCR and created the FCR algorithm to improve the robustness against noise and outliers, especially for c-regression models.
  Krishnapuram and Keller (1993) proposed a possibilistic c-means (PCM) clustering approach which is more robust than FCM to noise and outliers. Our current study is to embed possibilistic clustering into switching regression models and called it possibilistic c-regressions (PCR). Although PCR ameliorate the problem of outliers and noisy points more than FCR, PCR still depends heavily on initializations. In part II, I present the step-wise possibilistic c-regressions (SPCR) method which repeats PCR on a series of nested subsets using the clustering results of the previous subset as good initial values for PCR on the succeeding subset. The proposed SPCR is without initial values and robust to noises and outliers. Several experiments demonstrate that the accuracy and efficiency of our proposed method are superior to other c-regression methods.
  Change-point (CP) regression models have been widely applied in various fields where detecting change-points (CPs) is an important problem. Detecting the location of CPs in regression models could be equivalent to partitioning data points into clusters of similar individuals. In part III, I present a newly proposed method, called fuzzy CP (FCP) algorithm, for detecting the CPs and simultaneously estimate the parameters of regression models. The fuzzy c-partitions concept is first embedded into the CP regression models, and then it is transferred into the pseudo memberships of data points belonging to each individual cluster, and so these estimates for model parameters by the fuzzy c-regressions method can be obtained. Subsequently, the FCM clustering is used to obtain new iterates of the CPs collection memberships by minimizing an objective function concerning the deviations between the predicted response values and data values. Several numerical examples and real data sets are used, and experimental results show that the proposed FCP is an effective and useful CP detection algorithm for CP regression models.
14:05-15:05 迁移原型聚类(Transfer Prototype-based Clustering)
邓赵红教授
Abstract—Traditional prototype-based clustering methods, such as the well-known fuzzy c-mean (FCM) algorithm, usually need sufficient data to find a good clustering partition. If available data are limited or scarce, most of them are no longer effective. While the data for the current clustering task may be scarce, there is usually some useful knowledge available in the related scenes/domains. In this study, the concept of transfer learning is applied to prototype-based fuzzy clustering (PFC). Specifically, the idea of leveraging knowledge from the source domain is exploited to develop a set of transfer prototype-based fuzzy clustering (TPFC) algorithms. First, two representative prototype-based fuzzy clustering algorithms, namely, FCM, and fuzzy subspace clustering (FSC), have been chosen to incorporate with knowledge leveraging mechanisms to develop the corresponding transfer clustering algorithms based on an assumption that there are the same number of clusters between the target domain (current scene) and the source domain (related scene). Furthermore, two extended versions are also proposed to implement the transfer learning for the situation that there are different numbers of clusters between two domains. The novel objective functions are proposed to integrate the knowledge from the source domain with the data in the target domain for the clustering in the target domain. The proposed algorithms have been validated on different synthetic and real-world datasets. Experimental results demonstrate their effectiveness in comparison with both the original prototype-based fuzzy clustering algorithms and the related clustering algorithms like multi-task clustering and co-clustering.
15:05-15:30 茶歇
15:30-16:30 Extended Variational Inference for Non-Gaussian Statistical Models
马占宇副教授
Abstract—Recent research demonstrate that the usage of non-Gaussian statistical models is advantageous in applications where the data are not Gaussian distributed. With conventionally applied model estimation methods, e.g., maximum likelihood estimation and Bayesian estimation, we cannot derive analytically tractable solution for non-Gaussian statistical models. In order to obtain closed-form solution, we extend the commonly used variational inference (VI) framework via lower-bound approximation, by utilizing convexity/relative convexity of the integrants in the non-Gaussian distributions. In this presentation, we introduce the principles of the extended variational inference (EVI) and demonstrate its advantages in non-Gaussian mixture models and bounded support matrix factorization. We also show the advantages of non-Gaussian statistical models in real life applications, such as speech coding, 3D depth map enhancement, and DNA methylation analysis. Here, we restrict our attention to the non-Gaussian distribution in the exponential family。