期刊文献+

基于相似性混合模型的蛋白质交互识别 被引量:2

Identification of Protein-protein Interaction Based on Hybrid Similarity Model
下载PDF
导出
摘要 现有采用机器学习方法的蛋白质交互关系识别系统仅以单句为依据,并且存在标注数据缺乏导致训练集规模小的问题。为此,基于相似性混合模型提出一种新的蛋白质交互识别方法。采用基本的关系相似性(RS)模型做初始判断,利用大规模文本计算单词特征间的相似性,在基本RS模型的基础上通过特征聚类方式引入单词相似性模型,从而建立一个混合模型。实验结果表明,该方法能够取得较高且较均衡的精确度和召回率,而单词相似性的引入又进一步提高了F值,并且其直接利用已有的交互信息,可避免额外的人工标注。 Current machine learning-based Protein-protein Interaction (PPI)identification systems make predictions solely on evidence within a single sentence and suffer from small training set. In this paper, a hybrid similarity model- based approach is proposed to address these issues. A basic Relational Similarity (RS) model is established to make initial predictions. Word similarity matrices are constructed using a corpus-based approach. A clustering algorithm is applied to group words according to their similarity. The obtained word clusters are introduced to the basic RS model to build a hybrid model. Experimental results show that the basic RS model achieves higher and well-balanced precision and recall, and the introduction of the word similarity model further improves the F-score. This approach makes use of known PPI information, thus releases the burden of manual annotation.
出处 《计算机工程》 CAS CSCD 北大核心 2015年第7期25-30,35,共7页 Computer Engineering
基金 国家自然科学基金资助项目(61202132 61170043)
关键词 蛋白质交互 关系相似性 单词相似性 K近邻分类 层次聚类 Protein-protein Interaction(PPI) Relational Similarity (RS) word similarity K-nearest Neighbor(KNN) classification hierarchical clustering
  • 相关文献

参考文献7

二级参考文献81

  • 1王煜,王正欧.基于模糊决策树的文本分类规则抽取[J].计算机应用,2005,25(7):1634-1637. 被引量:13
  • 2王煜,白石,王正欧.用于Web文本分类的快速KNN算法[J].情报学报,2007,26(1):60-64. 被引量:33
  • 3印鉴,谭焕云.基于χ~2统计量的kNN文本分类算法[J].小型微型计算机系统,2007,28(6):1094-1097. 被引量:13
  • 4Sebastiani F. Machine learning in automated text categorization[J ]. ACM Computing Surveys, 2002, 34(1):1 -47. 被引量:1
  • 5杨超.分词技术研究报告[R/OL].2008-03.教学资源网,计算机网络专栏,http://www.tingko.com/Lunwen/86083.html. 被引量:1
  • 6国家食品药品监督管理局.处方药与非处方药分类管理办法[S/OL].1999-06-11.http://www.sda.gov.on/WS01/CLD288/24524.html. 被引量:1
  • 7CHAPELLE O, SCHOLKOPF B, ZIEN A.Semi-supervised learning[M]. Cambridge MA: M1T Press, 2006. 被引量:1
  • 8BLUM A, MITCHELL T. Combining labeled and unlabeled data with co-training[C]//Proceedings of the 11th Annual Conference on Computational Learning Theory. New York: ACM Press, 1998: 92-100. 被引量:1
  • 9DEMPSTER A P, LAIRD N M, RUBIN D B. Maximum likelihood from incomplete data via the EM algorithm[J]. Journal of the Royal Statistical Society: Series B, 1977, 39(1):1-38. 被引量:1
  • 10JOACHIMS T. Transductive inference for text classification using support vector machines[C]//Proceedings of the 16th International Conference on Machine Learning. San Fransisco: [s.n.], 1999: 200-209. 被引量:1

共引文献65

同被引文献5

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部