摘要
符号聚类是对传统聚类的重要扩展,而区间数据是一类常见的符号数据。传统聚类中使用的对称性度量不一定适用于度量区间数据,且算法初始化也一直是干扰聚类的严重问题。因此,提出了一种适用于区间数据的度量——相互距离,并在此度量的基础上采用了一种全新的聚类方法——相似性传播聚类,解决了初始化干扰问题,从而得出了适用于区间数据的基于相互距离的相似性传播聚类。通过理论阐述和实验比较,说明了该算法比基于欧氏聚类的K-均值算法要好。
Clustering for symbolic data is an important extension of conventional clustering, and interval representation for symbolic data is often used. The symmetrical measures in conventional clustering algorithms are sometimes not fit to interval data and the initialization is another severe problem that can affect the clustering algorithms. One metric called mutual distances for interval data was proposed; based on the metric, a new clustering method named affinity propagation clustering that could solve the problem initialization was used. Then, affinity propagation clustering for symbolic interval data based on mutual distance was given. Theoretical explanation and experiments indicate that the proposed algorithm outperforms K-means based on Euclidean distances for the interval symbolic data.
出处
《计算机应用》
CSCD
北大核心
2008年第6期1441-1443,1493,共4页
journal of Computer Applications
基金
国家863计划项目(2007AA1Z1582006AA10Z313)
国家自然科学基金资助项目(60773206/F02010660704047/F030304)
2004年教育部跨世纪优秀人才支持计划基金项目(NCET-04-0496)
2005年教育部科学研究重点基金项目(105087)
中国科学院自动化所模式识别国家重点实验室开放课题
关键词
符号聚类
区间数据
相互距离
相似性传播
K-均值
clustering of symbol
interval data
mutual distance
affinity propagation
K-means