摘要
转录因子结合位点识别在基因表达调控过程中起着重要的作用.文中提出了一种贝叶斯模型驱动的模体识别的遗传优化算法GOBMD(Genetic Optimization with Bayesian Model for Motif Discovery).GOBMD首先使用一个基于位置加权散列的投影过程,将输入序列中的l-mers投影到k维(k<l)子空间,找出DNA序列中的起始良好候选模体,作为遗传算法的初始群体,以进一步求精.在遗传迭代过程中,采用结合贝叶斯模型的适应度函数指导进化过程.模拟数据的实验结果表明,与Gibbs、WINNOWER、SP-STAR、PROJECTION这些模体识别算法相比,GOBMD在对植入(l,d)-模体识别时有较好的性能,能够解决大部分挑战性的植入(l,d)-模体识别问题.此外,作者用Boxplot显示了上述模体识别算法在模拟数据识别上的性能系数分布,结果表明GOBMD具有较好的效率.针对真实生物序列的实验结果同样表明了GOBMD算法的有效性.
Transcription factor binding site(TFBS) detection plays an important role in gene finding and understanding gene regulation relationship.Motifs are weakly conserved and motif discovery is a challenging problem.We propose a new approach called Genetic Optimization with Bayesian model for Motif Discovery(GOBMD).GRBMA first uses a position-weight hashing based projection,which mapping the l-mers in DNA sequences into some k-demission subspaces(kl),to find good starting candidates motifs.GOBMD then employs an effective genetic refinement to evolve the candidate motifs for further optimization.GOBMD also incorporates the Bayesian formula and relative entropy in its fitness to find the best configuration of sites locations.Experimental results on simulated data show that GOBMD can compete with Gibbs,WINNOWER,SP-STAR,PROJECTION on most implanted(l,d)-motif finding problems.We compare the performance coefficient scores for identifying(l,d)-motif finding problems by making separate box plots for each of the algorithms listed above.The experimental results on realistic biological data by identifying a number of known transcriptional regulatory motifs in eukaryotes also show that GOBMD can predict the TFBSs efficiently.
出处
《计算机学报》
EI
CSCD
北大核心
2012年第7期1429-1439,共11页
Chinese Journal of Computers
基金
国家自然科学基金(69601003)
博士点基金(20100203110010)
青年科学基金(60705004)资助
关键词
模体识别
遗传算法
贝叶斯模型
散列
投影
motif identification
genetic algorithm
Bayesian model
hashing
projection