It is of significance for splice site prediction to develop novel algorithms that combine the sequence patterns of regulatory elements such as enhancers and silencers with the patterns of splicing signals.In this pape...It is of significance for splice site prediction to develop novel algorithms that combine the sequence patterns of regulatory elements such as enhancers and silencers with the patterns of splicing signals.In this paper,a statistical model of splicing signals was built based on the entropy density profile(EDP) method,weight array method(WAM) and κ test;moreover,the model of splicing regulatory elements was developed by an unsupervised self-learning method to detect motifs associated with regulatory elements.With two models incorporated,a multi-level support vector machine(SVM) system was de-vised to perform ab initio prediction for splice sites originating from DNA sequence in eukaryotic ge-nome.Results of large scale tests on human genomic splice sites show that the new method achieves a comparative high performance in splice site prediction.The method is demonstrated to be with at least the same level of performance and usually better performance than the existing SpliceScan method based on modeling regulatory elements,and shown to have higher accuracies than the traditional methods with modeling splicing signals such as the GeneSplicer.In particular,the method has evident advantage over splice site prediction for the genes with lower GC content.展开更多
基金the State Basic Research Program of China (Grant No. 2003CB715905)National Nature Science Foundation of China (Grant Nos. 30300071, 30770499 and 10721403)Youth Foundation of College of Engineering of Peking University
文摘It is of significance for splice site prediction to develop novel algorithms that combine the sequence patterns of regulatory elements such as enhancers and silencers with the patterns of splicing signals.In this paper,a statistical model of splicing signals was built based on the entropy density profile(EDP) method,weight array method(WAM) and κ test;moreover,the model of splicing regulatory elements was developed by an unsupervised self-learning method to detect motifs associated with regulatory elements.With two models incorporated,a multi-level support vector machine(SVM) system was de-vised to perform ab initio prediction for splice sites originating from DNA sequence in eukaryotic ge-nome.Results of large scale tests on human genomic splice sites show that the new method achieves a comparative high performance in splice site prediction.The method is demonstrated to be with at least the same level of performance and usually better performance than the existing SpliceScan method based on modeling regulatory elements,and shown to have higher accuracies than the traditional methods with modeling splicing signals such as the GeneSplicer.In particular,the method has evident advantage over splice site prediction for the genes with lower GC content.