摘要
聚类是识别基因表达数据蕴含的关键基因调控模块的一种有效方法,基因表达谱的相似性度量是聚类的关键问题.然而,一般的相似性度量方法不能刻画时间序列基因表达谱数据所蕴含的时间延迟、反向相关和局部相关等复杂的基因调控关系.针对时间序列基因表达谱数据,提出一种基于近邻传播和动态规划的相似性度量方法和聚类算法.在大鼠再生肝细胞基因表达谱数据集上的聚类结果与基因功能富集分析结果高度一致,证明算法在时间序列基因表达谱数据聚类上的有效性.
Clustering is an effective method used to identify key gene regulation modules from time-course gene expression data.An important problem in clustering is similarity measure between gene expression profiles.However,general similarity measurement methods are not suitable for gene expression data with time delay,invert correlation and transient correlation feature.Since temporal feature reflect complex regulation relationships between genes,a similarity measurement method for timecourse gene expression data and a clustering algorithm based on affinity propagation and dynamic programming methods were proposed.Clustering results in hepatocyte gene expression dataset during rat liver regeneration showed that proposed algorithm is high degree of consensus with gene function enriched analysis.Results shows validity of proposed algorithm applied in timecourse gene expression data.
出处
《河南师范大学学报(自然科学版)》
CAS
北大核心
2015年第6期134-140,共7页
Journal of Henan Normal University(Natural Science Edition)
基金
国家973前期研究专项基金(2012CB722304)
河南省基础与前沿计划研究项目(122300410355)
关键词
近邻传播
时间序列
反向相关
瞬时相关
基因表达谱
affinity propagation
time-course
invert correlation
transient correlation
gene expression prolife