摘要
针对混合属性数据聚类结果精度不高、聚类结果对参数敏感等问题,提出了基于残差分析的混合属性数据聚类算法(Clustering algorithm for mixed data based on residual analysis)RA-Clust.算法以改进的熵权重混合属性相似性度量对象间的相似性,以提出的基于KNN和Parzen窗的局部密度计算方法计算每个对象的密度,通过线性回归和残差分析进行聚类中心预选取,然后以提出的聚类中心目标优化模型确定真正的聚类中心,最后将其他数据对象按照距离高密度对象的最小距离划分到相应的簇中,形成最终聚类.在合成数据集和UCI数据集上的实验结果验证了算法的有效性.与同类算法相比,RA-Clust具有较高的聚类精度.
For the existing mixed data clustering algorithm,there are some problems such as low clustering accuracy and parameters sensitive,a clustering algorithm for mixed data based on residual analysis(RA-Clust)is proposed.We use entropy weight to measure the similarity between objects with mixed attributes.Based on KNN and Parzen windows,we propose a method to calculate the local density of objects.Pre-selected cluster centers is conducted by linear regression and residual analysis.Then,the true cluster centers are selected according to objective optimization model proposed in this paper.Finally,the remaining objects are assigned into corresponding clusters according to the minimum distance from the high density objects.The experimental results on synthetic datasets and UCI datasets verify the effectiveness.Compared with similar algorithms,RA-Clust has a higher clustering accuracy.
作者
邱保志
张瑞霖
李向丽
QIU Bao-Zhi;ZHANG Rui-Lin;LI Xiang-Li(School of Information Engineering,Zhengzhou University,Zhengzhou 450001)
出处
《自动化学报》
EI
CSCD
北大核心
2020年第7期1420-1432,共13页
Acta Automatica Sinica
基金
河南省基础与前沿技术研究项目(152300410191)资助。