摘要
详细分析讨论了BIRCH算法中存在的不足,并针对其不足进行一定的改进,提出了一种基于离差平方和的改进多阈值BIRCH算法,充分利用离差平方和来建立簇与簇的相关性,相对于单纯以簇之间的中心距离来建立相关性有一定的改进,同时在分裂因子的确定上采用了簇中直径的最大值,克服因采用经验值确定分裂因子的缺陷。最后,引入到基因序列图形表达数据聚类分析应用中。
BIRCH(Balanced Iterative Reducing and Clustering Using Hierarchies) clustering algorithm is a new algorithm for large datasets, but this algorithm has some defects. Considering these defects, on the threshold in the CF-tree based on sum of deviation square to meliorate the pertinence between the clusters,the split factor is defined by the max diameter to overcome defect of the factor from the experience. At last, the improved BIRCH clustering algorithm to analyze the gene graphical representation data elementary is brought.
出处
《科学技术与工程》
2008年第10期2579-2583,2588,共6页
Science Technology and Engineering
基金
湖南省自然科学重点基金[06JJ4076]
湖南省财政厅基金[200590]资助
关键词
BIRCH算法
聚类特征
基因图形表达数据
BIRCH algorithm clustering feature data of gene graphical representation