期刊文献+

带有缺失数据的一种动态聚类方法

A Dynamic Clustering Method with Missing Data
下载PDF
导出
摘要 【目的】探讨实际问题研究中的不完全数据聚类。【方法】利用相关变量的辅助信息,对缺失数据进行推估,确定其合理的替代值,从而构造出一个"完全"数据集。在此基础上以EM算法循环迭代,参数的估计值和缺失数据的替代值都将逐渐收敛,以相应的贝叶斯后验概率判别个体的归类,进而实现动态聚类。【结果】模拟研究表明,缺值替代法具有较好的收敛性,对有缺失的数据基本都可正确地聚类。【结论】Fisher的鸢尾花花类识别数据验证了缺值替代法的可行性,其聚类的准确性高于缺值删除法,基本接近完全数据聚类。 【Objective】 The aim of the study is to investigate a clustering method for clustering the data with missing values in practice research. 【Method】The paper introduces a maximum likelihood-based dynamic clustering method, which could configure a complete data set through the maximum likelihood estimation for the missing by statistics of the others. The parameters of missing data and different clusters are estimated by the maximum likelihood method implemented via expectation-maximization (EM) algorithm and the objects are classified by the Bayesian posterior probability. 【Result】 The results of simulation studies show that the proposed method not only has fast convergence speed but also accurately cluster the data with missing values. 【Conclusion】The proposed method was further validated by Fisher’s Iris dataset. The result indicated that the proposed method had a significant advantage on clustering accuracy compared to the delete missing data arithmetic and it is similar to complete data clustering algorithm.
出处 《中国农业科学》 CAS CSCD 北大核心 2012年第21期4534-4542,共9页 Scientia Agricultura Sinica
基金 国家自然科学青年基金项目(31000539 31100882) 江苏省重点实验室开放课题(K10003)
关键词 聚类分析 缺失数据 后验概率 极大似然估计 cluster analysis missing data posterior probability maximum likelihood estimation
  • 相关文献

参考文献35

  • 1Wylie M P, Holtizman J. The non-line of sight problem in mobile location estimation//Proc. Fifth IEEE International Conference Universal Personal Communications (ICUPC) , Cambridge, MA, 1996, 2: 827-831. 被引量:1
  • 2张尧庭 方开泰.多元统计分析引论[M].北京:科学出版社,1983.488. 被引量:31
  • 3Johnoson R A, Wichern D W. Applied Multivariate Statistical Analysis. New Jersey: Prentice-Hall, Inc, 1982: 532-560. 被引量:1
  • 4Wang S C, Li X L, Tang H Y. Hybrid data clustering based on dependency structure and gibbs sampling. Lecture Notes in Computer Science, 2006, 4304:1145-1151. 被引量:1
  • 5高惠璇.应用多元统计分析[M].北京:北京大学出版社,2002. 被引量:3
  • 6Quackenbush J. Computational analysis of microarray data. Nature Reviews Genetics, 2001, 2:418-427. 被引量:1
  • 7Speed T. Statistical Analysis of Gene Expression Microarray Data. London/Boca Raton: Chapman and Hall/CRC Press, 2003. 被引量:1
  • 8MacQueen J B. Some methods for classification and analysis of multivariate observations. In." Proceedings of the 5th Berkeley Symposium, 1967, 1: 431-441. 被引量:1
  • 9Hartigan J A. Clustering Algorithms. New York: John Wiley and Sons, Inc, 1975. 被引量:1
  • 10Selim S Z, Alsultan K. A simulated annealing algorithm for the clustering problem. Pattern Recognition, 1991, 24(10): 1003-1008. 被引量:1

二级参考文献64

  • 1王长本,刘兴晖,王伟灵,周新.基因表达数据的聚类分析[J].国外医学(临床生物化学与检验学分册),2004,25(4):359-362. 被引量:3
  • 2肖静,胡治球,汤在祥,隋炯明,李欣,徐辰武.多个相关数量性状主基因的联合分析方法[J].中国农业科学,2005,38(9):1717-1724. 被引量:11
  • 3杨军,邹国华.比例Bootstrap及其方差估计的相合性[J].中国科学院研究生院学报,2007,24(3):273-279. 被引量:2
  • 4Wylie M P, Holtizman J. The non-line of sight problem in mobile location estimation.In: Pine IEEE ICUPC, Cambridge, MA, 1996, Vol 2. pp 827-831. 被引量:1
  • 5Zhang Y-T(张尧庭), Fang K-T(方开泰).Introduction to Multivariate Statistical Analysis(多元统计分析引论). Beijing: Science Press, 1983. pp 401-457 . 被引量:1
  • 6Johnoson R A, Wichern D W. Applied Multivariate Statistical Analysis. New Jersey: Prentice-Hall, Inc, 1982. pp 532-560. 被引量:1
  • 7Wu W L, Xiong H, Shekhar S. Clustering and Information Retrieval. Norwell, Mass. Kluwer Academic Publishers, 2004. 被引量:1
  • 8Leszczynski J. Computational Materials Science. Amsterdam, Boston: Elsevier, 2004. 被引量:1
  • 9Lee M-L T. Analysis of Microarray Gene Expression Data. Boston: Kluwer Academic Publishers, 2004. 被引量:1
  • 10Banks D L. Classification, clustering, and data mining applications. In: Proceedings of the Meeting of the International Federation of Classification Societies (IFCS) . Chicago: minois Institute of Technology, 2004. pp 15-18. 被引量:1

共引文献63

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部