摘要
蛋白质只有在特定的亚细胞位点(如细胞核、线粒体、细胞质等)才能参与正常的生命活动,因此蛋白质的哑细胞定位信息对于了解其功能有重要的意义。提出一种应用于蛋白亚细胞定位的多模糊k近邻加权投票算法。使用PSI-BLAST搜索得到的PSSM矩阵,以及1~7阶氨基酸对的信息作为输入特征,分别建立了8个模糊k近邻分类器,最后对所有分类器的结果使用加权投票得到最终预测结果。对包含四类亚细胞位置的RH-2427数据集进行jacknife测试,总预测精度达到88.1%,好于包括单一模糊k近邻在内的多种其它预测方法。同时,该方法可以方便地扩展到对包含叶绿体、高尔基体、溶酶体等更多类亚细胞位点的预测。
Subcellular location of proteins is one of the key functional characters because proteins can perform noral biological functions only after they are tvanslocated to corret subcellulnr locutions, In this paper, a novel method based on a weighted fuzzy k-nearest neighbors algorithm has been introduced, in which the position-specific scoring matrix (PSSM) generated from profiles of PSI-BLAST, amino acid composition and 1 to 7 order protein's dipeptide compositions were used. In the method,eight fuzzy K-NN classifiers have been constructed and the final prediction was made by jury voting of these classifiers. With a jackknife test on the EH-2427 data set containing four different subcellular locations, the total prediction accuracy reached 88.1% which was higher than all the other methods including the single fuzzy K- NN method. Moreover, this method can be easily expanded to handle the task of more subcellular location classification, e. g. golgi apparatus, chloroplast, lysosome and other important organelles in living cells.
出处
《中国生物医学工程学报》
CAS
CSCD
北大核心
2006年第1期106-109,共4页
Chinese Journal of Biomedical Engineering
基金
中国科学院知识创新工程重大项目
关键词
模糊k近邻
投票算法
亚细胞定位
生物信息学
weighted fuzzy K-NN
subcellular localization
machine learning
bioinformatics