摘要
从氨基酸的物化特性出发,利用物理学中“粗粒化”思想,提出了一种蛋白质序列的分组重量编码方法(Encoding Basedon Grouped Weight,简记为EBGW),并结合组分耦联算法进行结构型预测的研究。对标准集T359中359个蛋白质的Resubstitution检验和Jack-knife检验预测准确性分别达到99.72%和91.09%,其中Jack-knife检验总体预测精度比相同条件下采用氨基酸组成编码的方法提高了约7%,特别是α+β类的预测精度提高了15%。实验结果表明蛋白质序列的EBGW编码方法能够有效地提取字母序列中蕴含的结构信息。
Based on the idea of coarse-gained description in physics,a new encoding method with grouped weight for protein sequence is presented and applied to protein structural class prediction associated with component-coupled algorithm.The average rate of correct recognition is 99.72% in Resubstitution test and 91.09% in Jack-knife test for standard set of 359 proteins.For the same training dataset and the same predictive algorithm,the overall predictive accuracy of our method for the Jack-knife test is 7% higher than the accuracy based only on the amino-acid composition,especially for the class of α+β is 15% higher than that for amino-acid composition method.The experiment results show that the encoding method is efficient to extract the structure information implicated in protein sequence.
出处
《计算机工程与应用》
CSCD
北大核心
2007年第7期38-40,89,共4页
Computer Engineering and Applications
关键词
蛋白质序列
特征序列
组分耦联算法
结构型
amino acid sequence
characteristic sequence
component-coupled algorithm
structural class