摘要
PAM算法是K-中心点算法中最具代表性的算法。在此算法中,相似性度量的计算方法仅假设数据对象属性之间是独立同分布,采用欧几里得距离公式来进行计算。但现实数据集中,数据对象属性之间是非独立同分布的,即它们之间都是相关联的。因此,本文针对数值型数据,在PAM算法中引入了数值型数据非独立同分布计算公式,将原本的皮尔森相关系数替换为斯皮尔曼等级相关系数,并进行了实验验证。结果显示,数值型数据非独立同分布计算公式的引入很好地提高了PAM算法的聚类精度。
The PAM algorithm is the most representative algorithm in the K-medoids algorithm.In this algorithm,the calculation method of the similarity measure only assumes that the data object attributes are independent and identically distributed,and the Euclidean distance formula is used to calculate the distance.However,in the actual data set,the data object attributes are non-independent and identically distributed,that is,they are all related to each other.In this work,for the numerical data,the non-independent and identical distribution formula was introduced into the PAM algorithm,and the experimental verification was carried out.Results show that the introduction of non-independent and identical distribution calculation formulas for numerical data improves the clustering accuracy of PAM algorithm.
作者
韩冰
姜合
HAN Bing;JIANG He(School of Computer Science and Technology,Qilu University of Technology(Shandong Academy of Sciences),Jinan 250353,China)
出处
《齐鲁工业大学学报》
2019年第2期56-61,共6页
Journal of Qilu University of Technology
基金
国家自然科学青年基金项目(61502259)
关键词
聚类
PAM算法
相似性
非独立同分布
clustering
PAM algorithm
similarity
non-independent and identical distribution