摘要
聚类方法在基因表达数据分析中发挥着非常重要的作用,但基因表达数据相对其他领域的数据具有自身的特性,因此传统的数据距离定义和聚类方法已不能完全满足研究者对生物数据的分析要求。提出一种基于泊松分布的数据距离度量方式TransChisq,它以一种全新的视角定义了基因数据之间的距离,鉴于模糊聚类算法能够更加深刻地描述复杂的基因作用关系,将TransChisq距离与模糊聚类方法相结合对模糊C均值算法进行改进,并应用于真实基因表达数据分析。实验结果表明,该方法能够按照生物学的真实分类将基因表达数据聚类,并且可以发现更多的共调控基因,更加满足了基因表达数据分析的需要。
Cluster methods plays an important role in the gene expression data analysis,but the gene expression data has its own feature compared with the data in others fields,so the traditional distance measurement and cluster methods can not completely meet the target of researchers.The TransChisq distance based on Poisson distribution provides a new perspective to define the relationship between genes according to biological meaning,while fuzzy cluster algorithm can depict the complex interactions among genes thoroughly.Thereupon,an improved fuzzy C-means cluster algorithm which using the TransChisq distance is applied to the real gene expression data,the experiment result shows the method can cluster the gene expression date with its true classify in biology and find more co-regulation gene.
出处
《计算机工程与应用》
CSCD
北大核心
2010年第7期32-33,38,共3页
Computer Engineering and Applications
关键词
模糊C均值
基因表达数据
距离
fuzzy C-means cluster
gene expression date
distance