摘要
随着基因芯片技术的发展,双聚类分析方法首先被应用到高维基因表达数据的研究中.由于多数高维数据的稀疏性,应用主成分分析方法将高维数据转化到低维数据空间,从而在低维空间中应用聚类分析方法.不同的聚类分析方法会得到不同的聚类效果,并且同一种聚类方法处理不同的高维数据也会得到不同的聚类效果.因此,首先评估了阿尔茨海默基因表达数据的特征集的聚类趋势,接下来给出了改进地δ阈值层次聚类算法的算法描述.由于已有工作分别给出了不同的δ阈值的计算规则,于是比较了它们δ阈值下的层次聚类算法,并且给出了相应的聚类评价.
With the development of gene microaiTay technology, biclustering is applied to the research of high dimension of gene expression data. Due to the sparsity of most high-dimensional data, high-dimensional data are transferred into low-dimensional data by dimensionality reduction and so, it could be clustering in the low-dimensional data. Meanwhile, a variety of clustering appear different pattern and different data appears to different pattern for the established clustering. For gene expression data of Alzheimer' s disease, clustering tendency of feature sets is evaluated. Then, algorithm of improved hierarchical clustering with parameter δ is described. Ref- erences before establish computing method of parameter δ, respectively. Thus, two improved hierarchical clusterings with parameter δ assigned different value are compared and clustering measure named silhouette coefficient is computed, respectively.
出处
《四川师范大学学报(自然科学版)》
CAS
北大核心
2015年第6期925-929,共5页
Journal of Sichuan Normal University(Natural Science)
基金
中国航空科学基金(2012ZD11)
关键词
层次聚类
阈值
基因表达数据
hierarchical clustering
threshold
gene expression data