摘要
以氨基酸组成为特征对膜蛋白的分类,忽略了序列残基之间的相关性信息,而采用传统支持向量机算法作为分类算法,在解决多类问题时会出现分类盲区问题。针对这两种情况,计算蛋白质序列的氨基酸组成、二肽组成以及6种氨基酸相关系数,将三类特征结合,作为膜蛋白序列的特征向量;同时采用模糊支持向量机作为分类器,解决了传统支持向量机在多类数据识别中的盲区问题。测试结果表明,在相同特征输入下,模糊支持向量机分类性能优于传统支持向量机;在相同分类器的情况下,氨基酸组成、二肽组成和相关系数组合的特征选择方法的分类性能优于只使用其中一类或两类特征的方法;而采取组合特征和模糊支持向量机相结合的分类策略,在独立性数据集测试中的整体预测精度达到97%,优于现有的多种分类策略,是目前最有效的膜蛋白分类方法之一。
In the multi-class problem of membrane protein classification, the sequence order intbrmation will be ignored if only using features of amino acid compositions, at the same time traditional support vector machine (SVM) has unclassifiable region there. In order to overcome these problems, amino acid compositions, dipoptide compositions and six types of amino acid index correlation coefficients were combined as classification features. Then, fuzzy support vector machine (FSVM) was applied as a classifier to predict membrane protein types. Results of 5-fold cress-validation tests show that FSVM outperformed traditional SVMs under the same input and combined feature inputs do better than inputs with only one or two of the three features. The results of independent dataset tests show that the classification strategy with over 97 % accuracy in this study outperforms many existing methods and is one of the most effective methods for classification of membrane proteins.
出处
《生物医学工程研究》
2007年第4期299-304,共6页
Journal Of Biomedical Engineering Research
基金
国家自然科学基金资助项目(60603054)
关键词
模糊支持向量机
自相关函数
分类策略
膜蛋白
跨膜蛋白
Fuzzy support vector machine
Auto-correlation function
Classification strategy
Membrane protein
Transmembrane protein