期刊文献+

Hadoop框架下节点重要性算法实现蛋白质功能预测

Predicting Protein Function Method with Node Importance Algorithm Based on Hadoop
下载PDF
导出
摘要 论文从蛋白质序列数据的角度出发,通过序列相似度循环匹配构造蛋白质网络,并且通过网络节点重要性排序算法预测蛋白质功能.以节点重要性重要性作为研究对象,在蛋白质网络应用节点重要性算法Page Rank计算网络中蛋白质节点PR值,在Hadoop平台上进行开发实现功能预测的并行计算,减小运行时间.最后通过准确率,召回率以及F1-measure三个指标来衡量结果,并对比传统的功能预测方法,验证结果的有效性. This paper starts from the perspective of protein sequence data, and constructs the protein network by cyclic sequence similarity matching. Then a novel method based on ranking the importance of network nodes is proposed. Considering the importance of protein nodes in the network, the node importance algorithm Page Rank(PR) is used to compute the nodes' PR value. The proposed method is also developed on the Hadoop Platform, which makes it more suitable for huge genome database with great efficiency and parallel computing. Finally, comparing the traditional method of function prediction by the Accurate rate, Recall rate and F1-measure measurements, our method has been validated and the result shows that the method is feasible and valuable for practical usage.
出处 《计算机系统应用》 2016年第5期77-82,共6页 Computer Systems & Applications
基金 福建省自然科学基金(2014J01220) 三明学院科研基金(B201201/G) 福建省教育厅科技基金(JB13187 JA15463)
关键词 蛋白质序列 功能预测 循环匹配 节点重要性 HADOOP平台 protein sequence function prediction circular matching node importance Hadoop platform
  • 相关文献

参考文献22

  • 1Hawkins T, Chitale M, Luban S, et al. PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins-structure Function & Bioinformatics, 2009, 74(3): 566-582. 被引量:1
  • 2孙啸,陆祖宏,谢建明编著..生物信息学基础[M].北京:清华大学出版社,2005:336.
  • 3Altschul SF, Gish W, Miller W, et al. Basic local alignment search tool. Journal of Molecular Biology, 1990, 215(3): 403-410. 被引量:1
  • 4Pearson WR. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods in Enzymology, 1990, 183(1): 63-98. 被引量:1
  • 5Altschul SF, Madden TL, Schaiffer AA, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 1997, 25(8): 3389-3402. 被引量:1
  • 6陈义明,贺细平,乔波.一种基于树的蛋白质功能预测算法:KDE–CSSA[J].湖南农业大学学报(自然科学版),2015,41(1):62-66. 被引量:1
  • 7孟军,张信.基于双重索引矩阵的蛋白质功能预测[J].计算机应用,2015,35(6):1637-1642. 被引量:1
  • 8罗纪文.基于二阶马尔可夫随机场的蛋白质功能预测[J].科技信息,2014(12):79-79. 被引量:1
  • 9Bujnicki J. Sequence permutations in the molecular evolution of dna methyl transferases. BMC Evolutionary Biology, 2: 3, 2002. 被引量:1
  • 10Cunningham BA, Hemperly J J, Hopp TP, Edelman GM. Favin versus concanavalin A: Circularly permuted amino acid sequences. Proc. Natl. Aead. Sci. USA, 1979, 76(7): 3218-3222. 被引量:1

二级参考文献40

  • 1Ruepp A, Zollner A, Maier D , et al. The funcat, a functional annotation scheme for systematic classification of proteins from whole genomes[J]. Nucleic Acids Research, 2004, 32(18): 5539-5545. 被引量:1
  • 2Schietgat L, Vens C, Struyf J, et al. Predicting gene ftmction using hierarchical multi-label decision tree ensembles[J]. BMC Bioinformatics, 2010, 11(1): 1-14. 被引量:1
  • 3Barutcuoglu Z, Schapire R E, Troyanskaya O G. Hierarchical multi-label prediction of gene fimction[J]. Bioinformatics, 2006, 22(7): 830-836. 被引量:1
  • 4Weston J, Chapelle O, Vapnik V, et al. Kernel dependency estimation[C]//Advances in Neural Information Processing Systems, 2002: 873-880. 被引量:1
  • 5Hsu D, Kakade S, Langford J, et al. Multi-label prediction via compressed sensing[C]//Advances in Neural Information Processing Systems 22, 2009: 772- 780. 被引量:1
  • 6Tai F , Lin H T. Multilabel classification with principal label space transformation[J]. Neural Computation, 2012, 24(9): 2508-2542. 被引量:1
  • 7Baraniuk R, Jones D. A signal-dependent time-frequency representation: fast algorithm for optimal kernel design[l]. Signal Processing, IEEE Transactions on, 1994, 42(1): 134-146. 被引量:1
  • 8Baraniuk R D. Optimal tree pproximation with wavelets [C]//SPIE's International Symposium on Optical Science, Engineering, and Instrumentation, International Society for Optics and Photonics, 1999: 196-207. 被引量:1
  • 9Baraniuk R G, Cevher V, Duarte M F. Model-based compressive sensing[J]. IEEE Transactions on Information Theory, 2010, 56(4): 1982-2001. 被引量:1
  • 10Clare A. Machine learning and data mining for yeast functional genomics[D]. Wales : The University of Wales, 2003. 被引量:1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部