期刊文献+

LS-SVM与条件随机场结合的生物证据句子抽取 被引量:2

Biological Evidence Sentence Extraction with Combination of LS- SVM and Conditional Random Field
下载PDF
导出
摘要 对于生物证据句子抽取问题,传统特征和贝叶斯分类模型构建的抽取系统效率不高,导致抽取结果的召回率较低。为此,针对单句抽取问题和多句混合抽取问题,分别构建2套系统。利用最小二乘支持向量机模型结合新的特征组合和句子过滤模块构建系统1,解决传统特征涵盖不全面的问题,并在系统1中融入条件随机场模型,融合候选句判别规则建立系统2,解决连续多句合并的问题。实验结果表明,在单句抽取问题上,相比贝叶斯模型的基准系统,系统1召回率和F值分别提高39.7%和12.9%,在多句混合抽取问题上,相比基于正例和无标记样本学习系统,系统2的召回率提高了37.1%。 For the Gene Ontology Evidence Sentences( GOES) extraction problem,the recall rate and efficiency of the traditional system built on traditional features and Bayesian classification model are relatively low. In order to solve this problem,two systems are built for the single sentence and joined sentences retrieval. System 1 is built on Support Vector Machine( SVM) model and new combination of features,which solves the problem of incomplete coverage. Conditional Random Field ( CRF ) model and the rules of identification of candidate sentence are added into System 1 to build System 2 which solve the problem of sentences combination. Experimental results show that, in the single sentence extraction problem,compared with the Bayesian model based system,the recall and F-value of System 1 are increased by 39. 7% and 12. 9% . In the joined sentences extraction problem,compared with the Learning from Positive and Unlabeled Documents for Retrieval(LPU) system,the recall of System 2 is increased by 37. 1% .
出处 《计算机工程》 CAS CSCD 北大核心 2015年第5期207-212,共6页 Computer Engineering
关键词 生物证据句子 特征结合 支持向量机 最小二乘支持向量机 条件随机场 biological evidence sentence feature combination Support Vector Machine (SVM) Least SquaresSupport Vector Machine (LS-SVM) Conditional Random Field (CRF)
  • 相关文献

参考文献13

  • 1Mao Yuqing,Kimberly V A,Li Donghui,et al.The Gene Ontology Task at Bio Creative IV[EB/OL].(2010-11-21).http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/bc4gotask.pdf. 被引量:1
  • 2Mao Yuqing,Kimberly V A,Li Donghui,et al.Corpus Construction for the Bio Creative IV GO Task[C]//Proceedings of the 4th Bio Creative Challenge Evaluation Workshop.Bethesda,USA:Bio Creative Organizing Committee,2013:128-138. 被引量:1
  • 3Gobeill J,Pasche E,Vishnyakova D,et al.Bitem/Sibtex Group Proceedings for Biocreative IV,Track 4:Gene Ontology Curation[C]//Proceedings of the 4th Bio Creative Challenge Evaluation Workshop.Bethesda,USA:Bio Creative Organizing Committee,2013:139-145. 被引量:1
  • 4Chen Jianming,Chang Yung-Chun,Johnny C W,et al.Gene Ontology Evidence Sentence Retrieval Using Combinatorial Applications of Semantic Class and Rule Patterns[C]//Proceedings of the 4th Bio Creative Challenge Evaluation Workshop.Bethesda,USA:Bio Creative Organizing Committee,2013:169-173. 被引量:1
  • 5Zhu Dongqing,Li Dingcheng,Carterette B,et al.Integrating Information Retrieval with Distant Supervision for Gene Ontology Annotation[C]//Proceedings of the 4th Bio Creative Challenge Evaluation Workshop.Bethesda,USA:Bio Creative Organizing Committee,2013:146-155. 被引量:1
  • 6Chen Yang,Torii M,Lu Chang-Tien,et al.Learning from Positive and Unlabeled Documents for Automated Detection of Alternative Splicing Sentences in Medline Abstracts[C]//Proceedings of IEEE International Conference on Bioinformatics and Biomedicine Workshops.Washington D.C.,USA:IEEE Press,2011:530-537. 被引量:1
  • 7Liu Hongfang,Torii M,Xu Guixian,et al.Learning from Positive and Unlabeled Documents for Retrieval of Bacterial Protein-protein Interaction Literature[C]//Proceedings of Workshop of the Bio Link Special Interest Group,International Conference on Linking Literature,Information,and Knowledge for Biology.Berlin,Germany:Springer-Verlag,2010:62-70. 被引量:1
  • 8Van G T,Brabanter D J,Moor D B,et al.Least Squares Support Vector Machines[M].Singapore:World Scientific Publishing,2002. 被引量:1
  • 9殷会,许建华,许花.基于LS-SVM的多标签分类算法[J].南京师范大学学报(工程技术版),2010,10(2):68-73. 被引量:6
  • 10阎威武,邵惠鹤.支持向量机和最小二乘支持向量机的比较及应用研究[J].控制与决策,2003,18(3):358-360. 被引量:138

二级参考文献18

  • 1[1]Vapnik V N. The Nature of Statistical Learning Theory[M]. New York: Springer-Verlag,1995. 被引量:1
  • 2[2]Vapnik V N. An overview of statistical learning theory[J]. IEEE Trans Neural Network,1999,10(5):988-999. 被引量:1
  • 3[3]Vapnik V N. The Nature of Statistical Learning Theory[M]. New York: Springer-Verlag,1999. 被引量:1
  • 4[4]Probenl L P. A set of neural network benchmark problem and benchmark rules[R]. Germany: University Karlsruhe,1994. 被引量:1
  • 5[5]Turney P D. Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm[J]. J of Artificial Intelligence Research,1995,2:369-409. 被引量:1
  • 6Elisseeff A,Weston J.A kernel method for multi-labelled classification[C] // Proceedings of Advances in Neural Information.New York:BlOwulf Technologies,2003:681-687. 被引量:1
  • 7Schapire R E,Singer Y.Boostexter; a boosting based system for text categorization[J].Machine Learning,2000,39(2/3):135-168. 被引量:1
  • 8Zhang M L,Zhou Z H.A k-nearest neighbor based algorithm for multi-label classification[C] // Proceedings of the IEEE International Conference on Granular Computing.Heidelberg;Springer Berlin,2004:718-721. 被引量:1
  • 9Zhu S H,Ji X,Xu W,et al.Multi-labelled classification using maximum entropy method[C] // Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development.Salvador; ACM,2004:274-281. 被引量:1
  • 10Trohidis K,Tsoumakas G,Kalliris G,et al.Multilabel classification of music into emotions[C] // Proceedings International Conference on Music Information Retrieval.Philadelphia; ISMIR,2008:325-330. 被引量:1

共引文献142

同被引文献48

引证文献2

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部