LS-SVM与条件随机场结合的生物证据句子抽取被引量：2

Biological Evidence Sentence Extraction with Combination of LS- SVM and Conditional Random Field

下载PDF

导出

摘要对于生物证据句子抽取问题,传统特征和贝叶斯分类模型构建的抽取系统效率不高,导致抽取结果的召回率较低。为此,针对单句抽取问题和多句混合抽取问题,分别构建2套系统。利用最小二乘支持向量机模型结合新的特征组合和句子过滤模块构建系统1,解决传统特征涵盖不全面的问题,并在系统1中融入条件随机场模型,融合候选句判别规则建立系统2,解决连续多句合并的问题。实验结果表明,在单句抽取问题上,相比贝叶斯模型的基准系统,系统1召回率和F值分别提高39.7%和12.9%,在多句混合抽取问题上,相比基于正例和无标记样本学习系统,系统2的召回率提高了37.1%。 For the Gene Ontology Evidence Sentences（ GOES） extraction problem,the recall rate and efficiency of the traditional system built on traditional features and Bayesian classification model are relatively low. In order to solve this problem,two systems are built for the single sentence and joined sentences retrieval. System 1 is built on Support Vector Machine（ SVM） model and new combination of features,which solves the problem of incomplete coverage. Conditional Random Field （ CRF ） model and the rules of identification of candidate sentence are added into System 1 to build System 2 which solve the problem of sentences combination. Experimental results show that, in the single sentence extraction problem,compared with the Bayesian model based system,the recall and F-value of System 1 are increased by 39. 7% and 12. 9% . In the joined sentences extraction problem,compared with the Learning from Positive and Unlabeled Documents for Retrieval（LPU） system,the recall of System 2 is increased by 37. 1% .

作者张力元姬东鸿

机构地区武汉大学计算机学院

出处《计算机工程》 CAS CSCD 北大核心 2015年第5期207-212,共6页 Computer Engineering

关键词生物证据句子特征结合支持向量机最小二乘支持向量机条件随机场 biological evidence sentence feature combination Support Vector Machine （SVM） Least SquaresSupport Vector Machine （LS-SVM） Conditional Random Field （CRF）

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献13

1Mao Yuqing,Kimberly V A,Li Donghui,et al.The Gene Ontology Task at Bio Creative IV[EB/OL].(2010-11-21).http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/bc4gotask.pdf. 被引量：1
2Mao Yuqing,Kimberly V A,Li Donghui,et al.Corpus Construction for the Bio Creative IV GO Task[C]//Proceedings of the 4th Bio Creative Challenge Evaluation Workshop.Bethesda,USA:Bio Creative Organizing Committee,2013:128-138. 被引量：1
3Gobeill J,Pasche E,Vishnyakova D,et al.Bitem/Sibtex Group Proceedings for Biocreative IV,Track 4:Gene Ontology Curation[C]//Proceedings of the 4th Bio Creative Challenge Evaluation Workshop.Bethesda,USA:Bio Creative Organizing Committee,2013:139-145. 被引量：1
4Chen Jianming,Chang Yung-Chun,Johnny C W,et al.Gene Ontology Evidence Sentence Retrieval Using Combinatorial Applications of Semantic Class and Rule Patterns[C]//Proceedings of the 4th Bio Creative Challenge Evaluation Workshop.Bethesda,USA:Bio Creative Organizing Committee,2013:169-173. 被引量：1
5Zhu Dongqing,Li Dingcheng,Carterette B,et al.Integrating Information Retrieval with Distant Supervision for Gene Ontology Annotation[C]//Proceedings of the 4th Bio Creative Challenge Evaluation Workshop.Bethesda,USA:Bio Creative Organizing Committee,2013:146-155. 被引量：1
6Chen Yang,Torii M,Lu Chang-Tien,et al.Learning from Positive and Unlabeled Documents for Automated Detection of Alternative Splicing Sentences in Medline Abstracts[C]//Proceedings of IEEE International Conference on Bioinformatics and Biomedicine Workshops.Washington D.C.,USA:IEEE Press,2011:530-537. 被引量：1
7Liu Hongfang,Torii M,Xu Guixian,et al.Learning from Positive and Unlabeled Documents for Retrieval of Bacterial Protein-protein Interaction Literature[C]//Proceedings of Workshop of the Bio Link Special Interest Group,International Conference on Linking Literature,Information,and Knowledge for Biology.Berlin,Germany:Springer-Verlag,2010:62-70. 被引量：1
8Van G T,Brabanter D J,Moor D B,et al.Least Squares Support Vector Machines[M].Singapore:World Scientific Publishing,2002. 被引量：1
9殷会,许建华,许花.基于LS-SVM的多标签分类算法[J].南京师范大学学报（工程技术版）,2010,10(2):68-73. 被引量：6
10阎威武,邵惠鹤.支持向量机和最小二乘支持向量机的比较及应用研究[J].控制与决策,2003,18(3):358-360. 被引量：138

二级参考文献18

1[1]Vapnik V N. The Nature of Statistical Learning Theory[M]. New York: Springer-Verlag,1995. 被引量：1
2[2]Vapnik V N. An overview of statistical learning theory[J]. IEEE Trans Neural Network,1999,10(5):988-999. 被引量：1
3[3]Vapnik V N. The Nature of Statistical Learning Theory[M]. New York: Springer-Verlag,1999. 被引量：1
4[4]Probenl L P. A set of neural network benchmark problem and benchmark rules[R]. Germany: University Karlsruhe,1994. 被引量：1
5[5]Turney P D. Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm[J]. J of Artificial Intelligence Research,1995,2:369-409. 被引量：1
6Elisseeff A,Weston J.A kernel method for multi-labelled classification[C] // Proceedings of Advances in Neural Information.New York:BlOwulf Technologies,2003:681-687. 被引量：1
7Schapire R E,Singer Y.Boostexter; a boosting based system for text categorization[J].Machine Learning,2000,39(2/3):135-168. 被引量：1
8Zhang M L,Zhou Z H.A k-nearest neighbor based algorithm for multi-label classification[C] // Proceedings of the IEEE International Conference on Granular Computing.Heidelberg;Springer Berlin,2004:718-721. 被引量：1
9Zhu S H,Ji X,Xu W,et al.Multi-labelled classification using maximum entropy method[C] // Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development.Salvador; ACM,2004:274-281. 被引量：1
10Trohidis K,Tsoumakas G,Kalliris G,et al.Multilabel classification of music into emotions[C] // Proceedings International Conference on Music Information Retrieval.Philadelphia; ISMIR,2008:325-330. 被引量：1

共引文献142

1雷烨,姜子运.基于最小二乘支持向量机的机车轴承故障诊断[J].电气传动自动化,2009,31(6):14-16. 被引量：6
2牛培峰,肖兴军,李国强,马云飞.万有引力搜索算法在电厂锅炉NOx排放模型中的应用研究[J].自动化博览,2011,28(S2):87-92. 被引量：2
3张朝元,胡光华,徐天泽.基于LS-SVM的交通流量时间序列预测[J].云南大学学报（自然科学版）,2004,26(B07):19-22. 被引量：10
4徐袭,姚琼荟,石敏.基于粗糙集与支持向量机的故障智能分类方法[J].计算技术与自动化,2006,25(1):32-34. 被引量：8
5傅永峰,苏宏业,张英,褚健.Adaptive Soft-sensor Modeling Algorithm Based on FCMISVM and Its Application in PX Adsorption Separation Process[J].Chinese Journal of Chemical Engineering,2008,16(5):746-751. 被引量：10
6赵丽娟,梅静静.基于C#和MATLAB混合编程的交通量预测[J].黑龙江科技信息,2010(2):15-15.
7LIANG Ping RAO Guo-ran LONG Xin-feng.Corrosion depth prediction based on non-linearity method[J].Journal of Chemistry and Chemical Engineering,2009,3(8):12-18.
8Peng Zhong.Atmospheric Environmental Quality Prediction Based on Support Vector Machine[J].Journal of Chemistry and Chemical Engineering,2010,4(2):1-6.
9王闻侠,潘丰.基于模糊最小二乘支持向量机的发酵过程建模研究[J].中国科技论文在线,2008,3(1):47-51. 被引量：1
10李炜,章寅,倪源.基于集成修剪的丁苯橡胶聚合转化率软测量[J].仪器仪表学报,2011,32(1):212-217. 被引量：2

同被引文献48

1黄盛璋.南海诸岛历来是中国领土的历史证据[J].东南文化,1996(4):84-94. 被引量：14
2车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2005,19(2):1-6. 被引量：116
3龚立群,孙洁丽.国外主要知识抽取项目介绍与评析[J].图书馆论坛,2007,27(4):11-15. 被引量：4
4欧阳辉,禄乐滨.基于证据理论的论文元数据抽取算法研究[J].电子设计工程,2010,18(4):66-69. 被引量：3
5余敏友,雷筱璐.南海诸岛争端国际仲裁的可能性——国际法分析[J].武汉大学学报（哲学社会科学版）,2011,64(1):5-11. 被引量：19
6葛斌,李芳芳,李阜,肖卫东.基于无向图构建策略的主题句抽取[J].计算机科学,2011,38(5):181-185. 被引量：10
7丁君军,郑彦宁,化柏林.基于规则的学术概念属性抽取[J].情报理论与实践,2011,34(12):10-14. 被引量：28
8郭剑毅,李真,余正涛,张志坤.领域本体概念实例、属性和属性值的抽取及关系预测[J].南京大学学报（自然科学版）,2012,48(4):383-389. 被引量：32
9车海燕,冯铁,张家晨,陈伟,李大利.面向中文自然语言文档的自动知识抽取方法[J].计算机研究与发展,2013,50(4):834-842. 被引量：17
10冷伏海,白如江,祝清松.面向科技文献的混合语义信息抽取方法研究[J].图书情报工作,2013,57(11):112-119. 被引量：27

引证文献2

1彭玉芳,陈将浩,何志强.基于机器学习和深度学习的南海证据性数据抽取算法比较与应用[J].现代情报,2022,42(2):55-69. 被引量：6
2徐红霞,李春旺.科技文献内容知识点抽取研究综述[J].数据分析与知识发现,2019,3(3):14-24. 被引量：3

二级引证文献9

1徐健,牛丽娇,范九伦,赵凤.电子信息类专业研究生论文创新点的描述[J].高教学刊,2022,8(7):72-75. 被引量：1
2丁睿祎,王玉琢,章成志.基于学术论文全文内容的特定领域算法实体抽取研究[J].数字图书馆论坛,2022(3):2-14. 被引量：3
3唐锐,李智杰,李昌华,张颉,介军.基于BIM与知识图谱的智能化审图系统设计与实现[J].计算机测量与控制,2022,30(9):155-161. 被引量：5
4琚沅红,牟冬梅,王书童,李桦,徐静雯,吕淑贞.少样本高质量医学知识的命名实体识别研究——以肺癌诊疗规范为例[J].现代情报,2023,43(2):9-19. 被引量：5
5王波,董礼,林勇,郭江,程东振,陈姜文.基于SENet-SSD的水电厂人员作业安全行为识别方法研究[J].水电与新能源,2023,37(2):26-29. 被引量：2
6程为,司徒凌云,郑德俊,王燕红,石进.面向南海叙事的事件要素自动抽取方法研究[J].情报科学,2023,41(3):155-163. 被引量：4
7刘云香,同军红,李穂丰,吴晓玲.小样本机器学习下数据多尺度挖掘算法设计[J].计算机仿真,2024,41(4):431-435. 被引量：1
8李保金,李叶,刘颖.基于科学知识图谱的图书情报领域学术热点分析[J].辽宁工业大学学报（社会科学版）,2024,26(2):37-42.
9程为,郑德俊,朱梦蝶,丛天时,王燕红.知识元逻辑关系抽取方法研究[J].情报学报,2024,43(7):862-874.

1张龙凯,王厚峰.文本摘要问题中的句子抽取方法研究[J].中文信息学报,2012,26(2):97-101. 被引量：10
2段雪莹,王阳.基于遗传算法的中文多文档自动摘要方法研究[J].科技信息,2010(35). 被引量：1
3王冲,黄凯奇.VFM： Visual Feedback Model for Robust Object Recognition[J].Journal of Computer Science & Technology,2015,30(2):325-339. 被引量：1
4张培颖.基于句子特征和语义距离的文本摘要技术[J].微计算机应用,2009,30(7):14-18. 被引量：3
5索红光,梁玉环,刘玉树.基于时间戳的多文档自动文摘[J].计算机工程,2007,33(16):164-165. 被引量：3
6刘德喜,何炎祥,姬东鸿,杨华.一种基于演化算法进行句子抽取的多文档自动摘要系统SBGA[J].中文信息学报,2006,20(6):46-53. 被引量：10
7刘茂福,余博,胡慧君.基于维基百科的多文档自动摘要系统研究[J].微型机与应用,2011,30(16):89-91.
8王海,胡珀.基于遗传算法的查询导向式自动文摘[J].微计算机信息,2009,25(28):23-25.
9于海滨,秦兵,刘挺,郎君.命名实体识别和指代消解在文摘系统中的应用[J].计算机应用研究,2006,23(4):180-182. 被引量：7
10王雍凯,毛存礼,余正涛,郭剑毅,洪旭东,罗林.基于图的新闻事件主题句抽取方法[J].南京理工大学学报,2016,40(4):438-443. 被引量：9

计算机工程

2015年第5期

浏览历史

内容加载中请稍等...

LS-SVM与条件随机场结合的生物证据句子抽取被引量：2

参考文献13

二级参考文献18

共引文献142

同被引文献48

引证文献2

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

LS-SVM与条件随机场结合的生物证据句子抽取 被引量：2

参考文献13

二级参考文献18

共引文献142

同被引文献48

引证文献2

二级引证文献9

相关作者

相关机构

相关主题

浏览历史

LS-SVM与条件随机场结合的生物证据句子抽取被引量：2