期刊文献+

使用图像特征构建快速有效的蛋白质折叠识别方法 被引量:5

A FAST AND EFFECTIVE APPROACH OF FOLD RECOGNITION BASED ON IMAGE FEATURE
原文传递
导出
摘要 蛋白质结构自动分类是探索蛋白质结构-功能关系的一种重要研究手段。首先将蛋白质折叠子三维空间结构映射成为二维距离矩阵,并将距离矩阵视作灰度图像。然后基于灰度直方图和灰度共生矩阵提出了一种计算简单的折叠子结构特征提取方法,得到了低维且能够反映折叠结构特点的特征,并进一步阐明了直方图中零灰度孤峰形成原因,深入分析了共生矩阵特征中灰度分布、不同角度和像素距离对应的结构意义。最后应用于27类折叠子分类,对独立集测试的精度达到了71.95%,对所有数据进行10交叉验证的精度为78.94%。与多个基于序列和结构的折叠识别方法的对比结果表明,此方法不仅具有低维和简洁的特征,而且无需复杂的分类系统,能够有效和高效地实现多类折叠子识别。 One of the most important research aims is to understand the relationship between structure and function of protein. Inspired by this aim, automatic classification of protein structre becomes one of major research approaches. However, how to extract compact and effective feature to characterize protein structure is still a challenge to it. In this paper, 3-D tertiary structure of protein fold was mapped into 2-D distance matrix which can be further regarded as gray level image. Next, based on histogram and gray level co-occurrence matrix (CoM), a feature extraction_ of fold structure with low-cost computation was presented and feature vector with low dimension and definite structural properties was obtained. Furthermore, the nature of histogram peak at gray level 0 was depicted, and the structural meanings of gray distribution, various angles and pixels distance of CoM were analyzed in detail respectively. Finally, the presented feature extraction was validated by classification of 27 types of folds, and compared with several feature methods based on sequence or structure. The presented method achieved the accuracy 71.95% in independent test by using 5-CV (cross validation) to select the parameters of support vector machines (SVM), and 78.94% with 10-CV test on the whole combined data of training and testing sets. The results show that the presented method can perform effectively and efficiently automatic classification of multiple types of folds with the benefit of low dimension and compact feature, but also no need of complicated classifier system.
出处 《生物物理学报》 CAS CSCD 北大核心 2009年第2期106-116,共11页 Acta Biophysica Sinica
基金 国家自然科学基金项目(60872145) 博士后科学基金项目(20070421130)~~
关键词 折叠识别 直方图 灰度共生矩阵 图像分析 支持向量机 Fold recognition Histogram Gray level co-occurrence matrix Image analysis Support vector machines
  • 相关文献

参考文献21

  • 1The UniProt Consortium. The universal protein resource (UniProt). Nucleic Acids Research, 2008,36 (Database): D190-D195 被引量:1
  • 2Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJP, Chothia C, Murzin AG. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Research, 2008,36(Database):D419-D425 被引量:1
  • 3Chandonia J, Hon G, Walker N, Lo Conte L, Koehl P, Levitt M, Brenner S. The ASTRAL compendium in 2004. Nucleic Acids Research, 2004,32(Database):D189-D192 被引量:1
  • 4Alison LC, Ian S, Tony L, Oliver CR, Richard G, Janet T, Christine A. The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Research, 2009,37(Database): D310-D314 被引量:1
  • 5Holm L, Kaariainen S, Rosenstrom P, Schenkel A. Searching protein structure databases with DaliLite v.3. Bioinformatics, 2008,24(23):2780-2781 被引量:1
  • 6Berman H, Henrick K, Nakamura H. Announcing the worldwide Protein Data Bank. Nature Structural Biology, 2003,10(12):980-980 被引量:1
  • 7Ding CHQ, Dubchak I. Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics, 2001,17(4):349-358 被引量:1
  • 8Chinnasamy A, Sung W K, Mittal A. Protein structure and fold prediction using tree-augmented naive bayesian classifier. Journal of Bioinformatics and Computational Biology, 2005, 3(4):803-820 被引量:1
  • 9施建宇,潘泉,张绍武,梁彦.基于支持向量机融合网络的蛋白质折叠子识别研究[J].生物化学与生物物理进展,2006,33(2):155-162. 被引量:19
  • 10Huang CD, Lin CT, Pal NR. Hierarchical learning architecture with automatic feature selection for multiclass protein fold classification. IEEE Transactions on NanoBioscience, 2003,2(4):221-232 被引量:1

二级参考文献41

  • 1施建宇,潘泉,张绍武,梁彦.基于支持向量机融合网络的蛋白质折叠子识别研究[J].生物化学与生物物理进展,2006,33(2):155-162. 被引量:19
  • 2李菁,王炜.氨基酸残基归类及用简化后的字符识别蛋白质结构保守区域[J].中国科学(C辑),2006,36(6):552-562. 被引量:1
  • 3Kneller D G,Cohen F E,Langridge R.Improvements in protein secondary-structure prediction by enhanced neural networks.J Mol Biol,1990,214 (1):171~182. 被引量:1
  • 4Zhang C T,Chou K C.An optimization approach to predicting protein structural class from amino acid composition.Protein Sci,1992,1 (3):401~408. 被引量:1
  • 5Dubchak I,Muchnik I,Mayor C,et al.Recognition of a protein fold in the context of the SCOP classification.Proteins,1999,35(4):401 ~407. 被引量:1
  • 6Ding C H Q,Dubchak I.Multi-class protein fold recognition using support vector machines and neural networks.Bioinformatics,2001,17 (4):349~358. 被引量:1
  • 7Chinnasamy A,Sung W K,Mittal A.Protein structure and fold prediction using tree-augmented naive bayesian classifier.J Bioinform Comput Biol,2005,3 (4):803~820.. 被引量:1
  • 8Nakashima H,Nishikawa K,Ooi T.The folding type of a protein is relevant to the amino acid composition.J Biochem,1986,99(1):153~162. 被引量:1
  • 9Vapnik V.The Nature of Statistical Learning Theory.New York:Spinger-Verlag,1995.1~188. 被引量:1
  • 10Jaakkola T,Diekhans M,Haussler D.Using the fisher kernel method to detect remote protein homologies.In:Lengauer T,eds.Proceedings of The Seventh International Conference on Intelligent Systems for Molecular Biology.Menlo Park:AAAI Press,1999.149~158. 被引量:1

共引文献23

同被引文献59

  • 1张玮,李晓琴,徐海松,任文科.蛋白质折叠类型识别方法研究[J].生物物理学报,2008,24(1):65-71. 被引量:5
  • 2施建宇,潘泉,张绍武,梁彦.基于支持向量机融合网络的蛋白质折叠子识别研究[J].生物化学与生物物理进展,2006,33(2):155-162. 被引量:19
  • 3胡敏,彭群生,谢立广,张涛,陈为.多准则框架下的蛋白质三维结构相似性检索[J].计算机学报,2006,29(12):2208-2217. 被引量:3
  • 4NAKASHIMA H, NISHIKAWA K, OOI T. The folding type of a protein is relevant to the amino acid composition[ J]. Journal of Biochore, 1986,99 ( 1 ) : 153-162. 被引量:1
  • 5XIAO X, SHAO S, DING Y,et al. Using cellular automata to generate image representation for biological sequences [ J ]. Amino Acids, 2005,28( 1 ) :29-35. 被引量:1
  • 6CHOU Kuo-zhou. A key driving force in determination of protein structural classes [ J ]. Biochemical and Biophysical Research Communications, 1999,264 ( 1 ) : 216- 224. 被引量:1
  • 7TAYLOR W R, ORENGO C A. Protein structure alignment[ J]. Journal of Molecular Biology, 1989,208( 1 ) : 1-22. 被引量:1
  • 8TIMOTHY H, IRWIN K, GORDON C. The theory and practice of distance geometry [ J ]. Bulletin of Mathematical Biology, 1983,45 (5) :655-720. 被引量:1
  • 9XIAO X, WANG P, CHOU K C. Predicting protein structural classes with pseudo amino acid composition:an approach using geometric moments of cellular automaton image[ J]. Journal of Theoretical Biology, 2008,254 ( 3 ) :691 - 696. 被引量:1
  • 10ZHANG C T, CHOU K C, MAGGIORA G M. Predicting protein structural classes from amino acid composition : application of fuzzy clustering[ J ]. Protein Engineering, 1995,8 ( 5 ) :425-435. 被引量:1

引证文献5

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部