期刊文献+

(l,d)-模体识别问题的遗传优化算法 被引量:6

Genetic Optimization for (l,d)-Motif Discovery
下载PDF
导出
摘要 转录因子结合位点识别在基因表达调控过程中起着重要的作用.文中提出了一种贝叶斯模型驱动的模体识别的遗传优化算法GOBMD(Genetic Optimization with Bayesian Model for Motif Discovery).GOBMD首先使用一个基于位置加权散列的投影过程,将输入序列中的l-mers投影到k维(k<l)子空间,找出DNA序列中的起始良好候选模体,作为遗传算法的初始群体,以进一步求精.在遗传迭代过程中,采用结合贝叶斯模型的适应度函数指导进化过程.模拟数据的实验结果表明,与Gibbs、WINNOWER、SP-STAR、PROJECTION这些模体识别算法相比,GOBMD在对植入(l,d)-模体识别时有较好的性能,能够解决大部分挑战性的植入(l,d)-模体识别问题.此外,作者用Boxplot显示了上述模体识别算法在模拟数据识别上的性能系数分布,结果表明GOBMD具有较好的效率.针对真实生物序列的实验结果同样表明了GOBMD算法的有效性. Transcription factor binding site(TFBS) detection plays an important role in gene finding and understanding gene regulation relationship.Motifs are weakly conserved and motif discovery is a challenging problem.We propose a new approach called Genetic Optimization with Bayesian model for Motif Discovery(GOBMD).GRBMA first uses a position-weight hashing based projection,which mapping the l-mers in DNA sequences into some k-demission subspaces(kl),to find good starting candidates motifs.GOBMD then employs an effective genetic refinement to evolve the candidate motifs for further optimization.GOBMD also incorporates the Bayesian formula and relative entropy in its fitness to find the best configuration of sites locations.Experimental results on simulated data show that GOBMD can compete with Gibbs,WINNOWER,SP-STAR,PROJECTION on most implanted(l,d)-motif finding problems.We compare the performance coefficient scores for identifying(l,d)-motif finding problems by making separate box plots for each of the algorithms listed above.The experimental results on realistic biological data by identifying a number of known transcriptional regulatory motifs in eukaryotes also show that GOBMD can predict the TFBSs efficiently.
出处 《计算机学报》 EI CSCD 北大核心 2012年第7期1429-1439,共11页 Chinese Journal of Computers
基金 国家自然科学基金(69601003) 博士点基金(20100203110010) 青年科学基金(60705004)资助
关键词 模体识别 遗传算法 贝叶斯模型 散列 投影 motif identification genetic algorithm Bayesian model hashing projection
  • 相关文献

参考文献28

  • 1Tompa M et al. Assessing computational tools for the discov- ery of transcription factor binding sites. Nature Biotechnology, 2005, 23(1): 137-144. 被引量:1
  • 2Das Modan K, Dai Ho-Kwok. A survey of DNA motif find- ing algorithms. BMC Bioinformaties, 2007, 8(Suppl 7)~ $21. 被引量:1
  • 3GuhaThakurta D. Computational identification of transcrip- tional regulatory elements in DNA sequence. Nucleic Acids Research, 2006, 34(12): 3585-3598. 被引量:1
  • 4Sinha S, Tompa M. YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresent- ation. Nucleic Acids Research, 2003, 31(13): 3586-3588. 被引量:1
  • 5Pesole G, Prunella N, Liuni S, Attimonelli M, Saccone C.WORDUP: An efficient algorithm for discovering statistically significant patterns in DNA sequences. Nucleic Acids Research, 1992, 20(11): 2871-2875. 被引量:1
  • 6Pavesi G, Mauri G, Pesole G. An algorithm for finding sig- nals of unknown length in DNA sequences. Bioinformatics, 2001, 17(1): S207-S214. 被引量:1
  • 7Marsan L, Sagot M-F. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. Journal of Computa- tional Biology, 2000, 7(3-4): 345-362. 被引量:1
  • 8Eskin E, Fevzner F A. l'inding composite regulatory pat- terns in DNA sequences. Bioinformatics, 2002, 18(1): 354-363. 被引量:1
  • 9Pevzner P A, Sze S H. Combinatorial approaches to finding subtle signals in DNA sequenees//Proeeedings of the Inter- national Conference on Intelligent Systems for Molecular Bi- ology (ISMB). Price Center, UC San Diego, La Jolla,California, 2000, 8:269-278. 被引量:1
  • 10GuhaThakurta D, Stormo G D. Identifying target sites for cooperatively binding factors. Bioinformatics, 2001, 17 (7) : 608-621. 被引量:1

二级参考文献17

  • 1Lander E S, Linton L M, Birren Bet al, Initial sequencing and analysis of the human genome. Nature, 2001, 409 (6822) : 860-921. 被引量:1
  • 2Saha Surya, Bridges Susan, Magbanua Zenaida V, Peterson Daniel G. Empirical comparison of ab initio repeat finding programs. Nucleic Acids Research, 2008, 36(7) : 2284-2294. 被引量:1
  • 3Lefebvre A, Leeroq T, Dauchel H, Alexandre J. FORRepeats: Detects repeats on entire chromosomes and between genomes. Bioinformatics, 2003, 19(3): 319-326. 被引量:1
  • 4Jones Nell C, Pevzner Pavel A. Introduction to Bioinformatics Algorithms. Cambridge, Massachusetts: MIT Press, 2004. 被引量:1
  • 5Huntington's Disease Collaborative Research Group. A novel gene containing a trinucleotide repeat that is expanded an unstable on Huntington's disease chromosomes. Cell, 1993, 72(6), 971-983. 被引量:1
  • 6Bergman Casey M, Quesneville Hadi. Discovering and detecting transposable elements in genome sequences. Briefings in Bioinformatics, 2007, 8(6) : 382-392. 被引量:1
  • 7Pevzner P A, Tang H, Tesler G. De novo repeat classification and fragment assembly. Genome Research, 2004, 14 (9): 1786-1796. 被引量:1
  • 8Kurtz S, Schleiermacher C. REPuter: Fast computation of maximal repeats in complete genomes. Bioinformatics, 1999, 15(5): 426-427. 被引量:1
  • 9Price A L, Jones N C, Pevzner P A. De novo identification of repeat families in large genomes. Bioinformatics, 2005, 21 (Supplement) : i351-i358. 被引量:1
  • 10Edgar R, Myers E. Piler: Identification and classification of genomic repeats. Bioinformatics, 2005, 21 (Supplement) : i152-i158. 被引量:1

共引文献3

同被引文献40

引证文献6

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部