摘要
本研究系统分析了酸性、碱性和中性酶在二级结构氨基酸组成上的差异。结果发现在形成特定二级结构过程中,酸性酶和碱性酶有着不同的氨基酸使用偏向;同时,在酸性和碱性酶中,中性氨基酸和侧链微小的氨基酸含量明显较高,这可能是它们适应极端pH的普遍机制。基于此,提出了一种提取蛋白质序列特征值的新方法,其10倍交叉验证的精度可达80.3%。与其他常见特征值提取方法相比,其精度提高了9.4%到18.7%不等;而随机森林算法比其他机器学习算法识别精度也高出2.7%到21.8%不等。
In this work, we systematically analyzed the secondary structure amino acid compositions of acidic and alkaline enzymes and compared them with neutral ones. We found that the propensity of the individual residues to participate in secondary structures and the consistently higher composition of neutral and tiny residues might be the general stability mechanisms for their adaptation to pH extremes. Based on this, we presented a secondary structure amino acid composition method for extracting useful features from sequence. The overall prediction accuracy evaluated by the 10-fold cross-validation reached 80.3%. Comparing our method with other feature extraction methods, the improvement of the overall prediction accuracy ranged from 9.4% to 18.7%. The random forests algorithm also outperformed other machine learning techniques with an improvement ranging from 2.7% to 21.8%.
出处
《生物工程学报》
CAS
CSCD
北大核心
2009年第10期1508-1515,共8页
Chinese Journal of Biotechnology
基金
国家重点基础研究发展规划(973计划)(No.2007CB707804)
国家自然科学基金(No.20806031)资助~~
关键词
二级结构
氨基酸组成
酸性酶
碱性酶
稳定性机制
特征提取
secondary structure
amino acid composition
acidic enzyme
alkaline enzyme
mechanism of stability
feature extraction
random forests