摘要
提出了一种有效的基于仿射传播聚类算法和后处理方法的蛋白质序列聚类方法.在聚类分析蛋白质序列时,为了优化仿射传播聚类算法的聚类结果,采用后处理的方式来提高聚类结果的质量.为了度量蛋白质序列之间的相似度,给出了一种改进的无比对计算方法.在6个蛋白质序列数据集上进行对比实验,实验结果表明,所给出的方法能够有效地分析蛋白质序列.
This paper proposes an efficient clustering method for protein sequences, using Affinity propagation algorithm (AP) and post-processing. In order to optimize the clustering result, post-processing is used to improve the clustering result of AP. To measure the similarity between two protein sequences, an improved alignment-free similarity measure is presented. This method is evaluated and compared with other algorithms on six protein sequences data sets. Experimental results demonstrate the effective performance of the proposed method.
出处
《软件学报》
EI
CSCD
北大核心
2011年第8期1827-1837,共11页
Journal of Software
基金
国家自然科学基金(60671033)
关键词
模式识别
聚类分析
序列分析
蛋白质序列
pattern recognition
cluster analysis
sequence analysis
protein sequence