摘要
目的:改进真核生物启动子的理论预测方法。方法:基于启动子序列的信号特征和内容特征,构建6个标准离散源,计算每条序列相对于标准离散源的离散增量;构建信号特征的启动子位置权重矩阵,计算其对应位置的位置权重打分函数,将所得到的两类参数输入支持向量机对果蝇启动子进行预测。结果:利用self-consistency和cross-validation两种方法对此算法进行检验,均获得了较高的预测成功率,结果表明五种转录因子结合位点的预测成功率均超过91%。结论:结果显示结合了支持向量机的离散增量算法能够有效的提高预测成功率,是进行真核生物启动子预测的一种很有效的方法。
Objective:To improve the predictive capacity of the algorithm for Eukaryotic promoter sequences.Method:Based on the six least increment diversity,three kinds of position weight matrix,and the percent of GC in the sequences,the content vectors and the signals vector were distilled from the promoter sequences.The vectors calculated were input into a support vector machine(SVM) algorithm to build a promoter classification model.Result:The human Pol II promoter sequences are predicted by using of support vector machine,the 10-fold cross-validation and the independent test data were used for validate the support vector machine model,the results show that the overall prediction accuracies(sensitivity) and specificity are more than 91%.Conclusion:These results indicate that the increment of diversity and support vector machines algorithm is an effective method for predicting the Eukaryotic promoter sequences.
出处
《生物技术》
CAS
CSCD
2008年第2期39-42,共4页
Biotechnology
基金
宝鸡文理学院硕士科研启动项目(ZK0791
ZK0792)