期刊文献+

基于分布式假设的弱监督蛋白质交互关系识别

Weakly Supervised Protein-protein Interaction Identification Based on Distribution Hypothesis
下载PDF
导出
摘要 蛋白质交互(protein-protein interaction)是生物医学领域一项重要的研究内容,目前由生物医学进行的PPI实验结果主要以文献的形式存储,随着生物医学文献的大量增加,以手工收集信息的方式已经难以满足实际需求。对此,提出一种基于分布式假设的弱监督蛋白质交互识别方法。首先,从描述蛋白质交互关系的上下文中提取表达语义关系的词汇模式,以少量有交互关系的蛋白质对构成初始种子集,基于分布式假设理论,根据词汇模式在种子集中的分布构建向量空间模型。然后依据相似性对词汇模式进行聚类,形成具有语义相似性的模式簇,利用这些簇在语料中找到新的具有相似分布的模式加入候选集。最后对候选集里的蛋白质对及其模式进行评估,挑选出满足条件的蛋白质对加入种子集进行迭代,最终得到有交互关系的蛋白质对。相比于现有方法,该方法考虑了上下文的语义相关性,实验结果表明,该方法以很小的种子集规模取得了较高的精确度与召回率。 Protein-protein interaction (PPI) is an important content of biological research. The results of PPI experiments carried out bybiomedical research are mainly stored in the form of literature. With the increasing of biomedical literatures,the way of manually collec-ting information has been difficult to meet the actual needs. For this,we propose a weakly supervised protein-protein interaction identifi-cation approach based on distribution hypothesis. First,a few interactive protein pairs are collected as seeds,and lexical patterns of allprotein pair which express semantic relation is extracted. Based on distribution hypothesis,vector space model is constructed according todistribution of patterns over seeds. Then,lexical patterns are clustered using the similarity. Using these clusters,some new semanticallyrelated patterns are recognized and then added to candidates. Lastly,based on the score of lexical patterns,protein pairs in candidates areevaluated and selected to the seed set. The seed set is expanded iteratively,and finally interactive protein pairs are identified. This ap-proach considers the semantically relation in context and achieves high precision and recall by small seeds set compared to results of previ-ous studies.
作者 毛宇薇 牛耘 MAO Yu-wei;NIU Yun(School of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing 211106,China)
出处 《计算机技术与发展》 2018年第9期34-37,共4页 Computer Technology and Development
基金 国家自然科学基金(61202132)
关键词 蛋白质交互 分布式假设 弱监督算法 关系相似性 protein-protein interaction distribution hypothesis weakly-supervised method relational similarity
  • 相关文献

参考文献6

二级参考文献44

  • 1CHAPELLE O, SCHOLKOPF B, ZIEN A.Semi-supervised learning[M]. Cambridge MA: M1T Press, 2006. 被引量:1
  • 2BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]//Proceedings of the 11th Annual Conference on Computational Learning Theory. New York: ACM Press, 1998: 92-100. 被引量:1
  • 3DEMPSTER A P, LAIRD N M, RUBIN D B. Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society: Series B, 1977, 39(1):1-38. 被引量:1
  • 4JOACHIMS T. Transductive inference for text classification using support vector machines[C]//Proceedings of the 16th International Conference on Machine Learning. San Fransisco: [s.n.], 1999: 200-209. 被引量:1
  • 5BELKIN M, MATVEEVA I, NIYOGI P. Regression and regularization on large graphs[C]//Proeeodings of the 17th Annual Conference on Learning Theory. New York: ACM Press, 2004: 185-192. 被引量:1
  • 6YAROWSKY D. Unsupervised word sense disambiguation rivaling supervised methods [ C ]// Proceedings of 33rd Annual Meeting of the Association for Computational Linguistics. Cambridge, MA: MIT Press, 1995: 189-196. 被引量:1
  • 7COLLINS M, YORAM S. Unsupervised models for named entity classification[ C]// Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. College Park, MD: [s.n. ], 1999. 被引量:1
  • 8NIGAM K, GHANI R. Analyzing the effectiveness and applicability of co-training[ C ]//9th International Conference on Information and Knowledge Management. McLean, Virginia: [ s. n. ], 2000: 86-93. 被引量:1
  • 9ZHOU Zhihua, LIMing. Tri-training: exploiting unlabeled data using three classifiers[ J ]. IEEE Tram on Knowledge and Data Engineering, 2005, 17(11) : 1529-1541. 被引量:1
  • 10EUGENE C. Immediate-head parsing for language models [ C]// Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. San Francisco: Morgan Kaufmann Publishers, 2001. 被引量:1

共引文献28

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部