转录因子结合位点识别在基因表达调控过程中起着重要的作用.文中提出了一种贝叶斯模型驱动的模体识别的遗传优化算法GOBMD(Genetic Optimization with Bayesian Model for Motif Discovery).GOBMD首先使用一个基于位置加权散列的投影过...转录因子结合位点识别在基因表达调控过程中起着重要的作用.文中提出了一种贝叶斯模型驱动的模体识别的遗传优化算法GOBMD(Genetic Optimization with Bayesian Model for Motif Discovery).GOBMD首先使用一个基于位置加权散列的投影过程,将输入序列中的l-mers投影到k维(k<l)子空间,找出DNA序列中的起始良好候选模体,作为遗传算法的初始群体,以进一步求精.在遗传迭代过程中,采用结合贝叶斯模型的适应度函数指导进化过程.模拟数据的实验结果表明,与Gibbs、WINNOWER、SP-STAR、PROJECTION这些模体识别算法相比,GOBMD在对植入(l,d)-模体识别时有较好的性能,能够解决大部分挑战性的植入(l,d)-模体识别问题.此外,作者用Boxplot显示了上述模体识别算法在模拟数据识别上的性能系数分布,结果表明GOBMD具有较好的效率.针对真实生物序列的实验结果同样表明了GOBMD算法的有效性.展开更多
In mammalian cells, transcribed enhancers(TrEns) play important roles in the initiation of gene expression and maintenance of gene expression levels in a spatiotemporal manner. One of the most challenging questions is...In mammalian cells, transcribed enhancers(TrEns) play important roles in the initiation of gene expression and maintenance of gene expression levels in a spatiotemporal manner. One of the most challenging questions is how the genomic characteristics of enhancers relate to enhancer activities. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers’ DNA code in a more systematic way. To address this problem, we developed a novel computational framework, Transcribed Enhancer Landscape Search(TELS), aimed at identifying predictive cell type/tissue-specific motif signatures of TrEns.As a case study, we used TELS to compile a comprehensive catalog of motif signatures for all known TrEns identified by the FANTOM5 consortium across 112 human primary cells and tissues.Our results confirm that combinations of different short motifs characterize in an optimized manner cell type/tissue-specific TrEns. Our study is the first to report combinations of motifs that maximize classification performance of TrEns exclusively transcribed in one cell type/tissue from TrEns exclusively transcribed in different cell types/tissues. Moreover, we also report 31 motif signatures predictive of enhancers’ broad activity. TELS codes and material are publicly available at http://www.cbrc.kaust.edu.sa/TELS.展开更多
The structure-based sequence motif of the distant proteins in evolution, protein tyrosine phosphatases (PTP) I and Ⅱ superfamilies, as an example, has been defined by the structural comparison, structure-based sequen...The structure-based sequence motif of the distant proteins in evolution, protein tyrosine phosphatases (PTP) I and Ⅱ superfamilies, as an example, has been defined by the structural comparison, structure-based sequence alignment and analyses on substitution patterns of residues in common sequence conserved regions. And the phosphatases I and Ⅱ can be correctly identified together by the structure-based PTP sequence motif from SWISS-PROT and TrEBML databases. The results show that the correct rates of identification are over 98%. This is the first time to identify PTP I and Ⅱ together by this motif.展开更多
文摘转录因子结合位点识别在基因表达调控过程中起着重要的作用.文中提出了一种贝叶斯模型驱动的模体识别的遗传优化算法GOBMD(Genetic Optimization with Bayesian Model for Motif Discovery).GOBMD首先使用一个基于位置加权散列的投影过程,将输入序列中的l-mers投影到k维(k<l)子空间,找出DNA序列中的起始良好候选模体,作为遗传算法的初始群体,以进一步求精.在遗传迭代过程中,采用结合贝叶斯模型的适应度函数指导进化过程.模拟数据的实验结果表明,与Gibbs、WINNOWER、SP-STAR、PROJECTION这些模体识别算法相比,GOBMD在对植入(l,d)-模体识别时有较好的性能,能够解决大部分挑战性的植入(l,d)-模体识别问题.此外,作者用Boxplot显示了上述模体识别算法在模拟数据识别上的性能系数分布,结果表明GOBMD具有较好的效率.针对真实生物序列的实验结果同样表明了GOBMD算法的有效性.
基金supported by the base funding (Grant No. BAS/1/1606-01-01) to VBB by the King Abdullah University of Science and Technology (KAUST), Saudi Arabia
文摘In mammalian cells, transcribed enhancers(TrEns) play important roles in the initiation of gene expression and maintenance of gene expression levels in a spatiotemporal manner. One of the most challenging questions is how the genomic characteristics of enhancers relate to enhancer activities. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers’ DNA code in a more systematic way. To address this problem, we developed a novel computational framework, Transcribed Enhancer Landscape Search(TELS), aimed at identifying predictive cell type/tissue-specific motif signatures of TrEns.As a case study, we used TELS to compile a comprehensive catalog of motif signatures for all known TrEns identified by the FANTOM5 consortium across 112 human primary cells and tissues.Our results confirm that combinations of different short motifs characterize in an optimized manner cell type/tissue-specific TrEns. Our study is the first to report combinations of motifs that maximize classification performance of TrEns exclusively transcribed in one cell type/tissue from TrEns exclusively transcribed in different cell types/tissues. Moreover, we also report 31 motif signatures predictive of enhancers’ broad activity. TELS codes and material are publicly available at http://www.cbrc.kaust.edu.sa/TELS.
基金This work was supported by the National Natural Science Foundation of China (Grant Nos. 30170507 and 30024004)the Natural Science Foundation of Yunnan Province (Grant Nos. 98C007R and 1999C0084M).
文摘The structure-based sequence motif of the distant proteins in evolution, protein tyrosine phosphatases (PTP) I and Ⅱ superfamilies, as an example, has been defined by the structural comparison, structure-based sequence alignment and analyses on substitution patterns of residues in common sequence conserved regions. And the phosphatases I and Ⅱ can be correctly identified together by the structure-based PTP sequence motif from SWISS-PROT and TrEBML databases. The results show that the correct rates of identification are over 98%. This is the first time to identify PTP I and Ⅱ together by this motif.