IESRL:An information extraction system for research level

IESRL:An information extraction system for research level

下载PDF

导出

摘要 Purpose:In order to annotate the semantic information and extract the research level information of research papers,we attempt to seek a method to develop an information extraction system.Design/methodology/approach:Semantic dictionary and conditional random field model(CRFM)were used to annotate the semantic information of research papers.Based on the annotation results,the research level information was extracted through regular expression.All the functions were implemented on Sybase platform.Findings:According to the result of our experiment in carbon nanotube research,the precision and recall rates reached 65.13%and 57.75%,respectively after the semantic properties of word class have been labeled,and F-measure increased dramatically from less than 50%to60.18%while added with semantic features.Our experiment also showed that the information extraction system for research level(IESRL)can extract performance indicators from research papers rapidly and effectively.Research limitations:Some text information,such as that of format and chart,might have been lost due to the extraction processing of text format from PDF to TXT files.Semantic labeling on sentences could be insufficient due to the rich meaning of lexicons in the semantic dictionary.Research implications:The established system can help researchers rapidly compare the level of different research papers and find out their implicit innovation values.It could also be used as an auxiliary tool for analyzing research levels of various research institutions.Originality/value:In this work,we have successfully established an information extraction system for research papers by a revised semantic annotation method based on CRFM and the semantic dictionary.Our system can analyze the information extraction problem from two levels,i.e.from the sentence level and noun(phrase)level of research papers.Compared with the extraction method based on knowledge engineering and that on machine learning,our system shows advantages of the both. Purpose:In order to annotate the semantic information and extract the research level information of research papers,we attempt to seek a method to develop an information extraction system.Design/methodology/approach:Semantic dictionary and conditional random field model(CRFM)were used to annotate the semantic information of research papers.Based on the annotation results,the research level information was extracted through regular expression.All the functions were implemented on Sybase platform.Findings:According to the result of our experiment in carbon nanotube research,the precision and recall rates reached 65.13%and 57.75%,respectively after the semantic properties of word class have been labeled,and F-measure increased dramatically from less than 50%to60.18%while added with semantic features.Our experiment also showed that the information extraction system for research level(IESRL)can extract performance indicators from research papers rapidly and effectively.Research limitations:Some text information,such as that of format and chart,might have been lost due to the extraction processing of text format from PDF to TXT files.Semantic labeling on sentences could be insufficient due to the rich meaning of lexicons in the semantic dictionary.Research implications:The established system can help researchers rapidly compare the level of different research papers and find out their implicit innovation values.It could also be used as an auxiliary tool for analyzing research levels of various research institutions.Originality/value:In this work,we have successfully established an information extraction system for research papers by a revised semantic annotation method based on CRFM and the semantic dictionary.Our system can analyze the information extraction problem from two levels,i.e.from the sentence level and noun(phrase)level of research papers.Compared with the extraction method based on knowledge engineering and that on machine learning,our system shows advantages of the both.

作者 Fuhai LENG Rujiang BAI Qingsong ZHU

机构地区 National Science Library Library of Shandong University of Technology

出处《Chinese Journal of Library and Information Science》 2013年第4期16-27,共12页 中国文献情报（英文版）

基金 supported by the National Social Science Foundation of China(Grant No.12CTQ032)

关键词 Research papers Information extraction Semantic labeling Regular expression Conditional random fields Research level Research papers Information extraction Semantic labeling Regular expression Conditional random fields Research level

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献12

1Adrian Ghemes,Yoshitaka Minami,Junichi Muramatsu,Morihiro Okada,Hidenori Mimura,Yoku Inoue.Fabrication and mechanical properties of carbon nanotube yarns spun from ultra-long multi-walled carbon nanotube arrays[J].Carbon.2012(12) 被引量：1
2Dong C. Liu,Jorge Nocedal.On the limited memory BFGS method for large scale optimization[J].Mathematical Programming (-).1989(1-3) 被引量：1
3Appelt,D.E,Onyshkevych,B.The common pattern specification language[].Association for Computational Linguistics TIPSTER’’ Proceedings of a Workshop.1998 被引量：1
4Peng,F,McCallum,A.Accurate information extraction from research papers using conditional random fields. http://acl.ldc.upenn.edu/N/N04/N04-1042.pdf . 2013 被引量：1
5http://crfpp.googlecode.com/svn/trunk/doc/index.html?source=navbar . 被引量：1
6Che,W.X.Kernel-based semantic role labeling. http://d.wanfangdata.com.cn/Thesis_D257095.aspx . 2008 被引量：1
7http://flexcrfs.sourceforge.net . 被引量：1
8Grishman,R.Information extraction:Techniques and challenges. http://link.springer.com/chapter/10.1007%2F3-540-63438-X_2 . 1997 被引量：1
9http://mallet.cs.umass.edu/ . 被引量：1
10Humphreys,K,Demetriou,G,Gaizauskas,R.Two applications of information extraction to biological science journal articles:Enzyme interactions and protein structures. http://psb.stanford.edu/psb-online/proceedings/psb00/humphreys.pdf . 2013 被引量：1

1关丽梅.用JS与正则表达式验证表单数据格式的方法[J].知识文库,2016,0(14):226-227.
2徐明.浅述GNU正规表达式支持库[J].程序员,2001,0(A04):71-76.
3HOUFENG WANG and DAWEI DAI(Computer Science Dept., Central China Normal University Wuhan Hubei P.R.Chlna 430070)(Computer science Dept., Wu Han UniversityWuhan ,Hubei P.R.China 430072).An　Inductive　Method　with　Genetic　Algorithm　for　Learning　Phrase－structure－rule　of　Natural　Language[J].Wuhan University Journal of Natural Sciences,1996,1(Z1):640-644.
4冯茜芦,潘金贵.一种基于句子的信息检索模型研究[J].计算机应用与软件,2010,27(3):162-164.
5软件问题[J].数码时代,2006(10):140-141.
6佘其炯.即时通信的现状与发展趋势[J].数字通信世界,2007(6):41-43.
7自然.硬盘术语解析[J].中学生电脑,2003(2):19-19.
8Hiroyuki Sano Shun Shiramatsu Tadachika Ozono Toramatsu Shintani.Web Block Extraction System Based on Client-Side Imaging for Clickable Image Map[J].通讯和计算机（中英文版）,2013,10(6):815-822.
9QUAN Changqin,REN Fuji.Finding Emotional Focus for Emotion Recognition at Sentence Level[J].Chinese Journal of Electronics,2013,22(1):99-103. 被引量：1
10ZOU Yuwei,GU Jinguang,FU Haidong.Medical Entity and Attributes Extraction System Based on Relation Annotation[J].Wuhan University Journal of Natural Sciences,2016,21(2):145-150. 被引量：1

Chinese Journal of Library and Information Science

2013年第4期

浏览历史

内容加载中请稍等...

IESRL:An information extraction system for research level

参考文献12

相关作者

相关机构

相关主题

浏览历史