期刊文献+

IESRL:An information extraction system for research level

IESRL:An information extraction system for research level
下载PDF
导出
摘要 Purpose:In order to annotate the semantic information and extract the research level information of research papers,we attempt to seek a method to develop an information extraction system.Design/methodology/approach:Semantic dictionary and conditional random field model(CRFM)were used to annotate the semantic information of research papers.Based on the annotation results,the research level information was extracted through regular expression.All the functions were implemented on Sybase platform.Findings:According to the result of our experiment in carbon nanotube research,the precision and recall rates reached 65.13%and 57.75%,respectively after the semantic properties of word class have been labeled,and F-measure increased dramatically from less than 50%to60.18%while added with semantic features.Our experiment also showed that the information extraction system for research level(IESRL)can extract performance indicators from research papers rapidly and effectively.Research limitations:Some text information,such as that of format and chart,might have been lost due to the extraction processing of text format from PDF to TXT files.Semantic labeling on sentences could be insufficient due to the rich meaning of lexicons in the semantic dictionary.Research implications:The established system can help researchers rapidly compare the level of different research papers and find out their implicit innovation values.It could also be used as an auxiliary tool for analyzing research levels of various research institutions.Originality/value:In this work,we have successfully established an information extraction system for research papers by a revised semantic annotation method based on CRFM and the semantic dictionary.Our system can analyze the information extraction problem from two levels,i.e.from the sentence level and noun(phrase)level of research papers.Compared with the extraction method based on knowledge engineering and that on machine learning,our system shows advantages of the both. Purpose:In order to annotate the semantic information and extract the research level information of research papers,we attempt to seek a method to develop an information extraction system.Design/methodology/approach:Semantic dictionary and conditional random field model(CRFM)were used to annotate the semantic information of research papers.Based on the annotation results,the research level information was extracted through regular expression.All the functions were implemented on Sybase platform.Findings:According to the result of our experiment in carbon nanotube research,the precision and recall rates reached 65.13%and 57.75%,respectively after the semantic properties of word class have been labeled,and F-measure increased dramatically from less than 50%to60.18%while added with semantic features.Our experiment also showed that the information extraction system for research level(IESRL)can extract performance indicators from research papers rapidly and effectively.Research limitations:Some text information,such as that of format and chart,might have been lost due to the extraction processing of text format from PDF to TXT files.Semantic labeling on sentences could be insufficient due to the rich meaning of lexicons in the semantic dictionary.Research implications:The established system can help researchers rapidly compare the level of different research papers and find out their implicit innovation values.It could also be used as an auxiliary tool for analyzing research levels of various research institutions.Originality/value:In this work,we have successfully established an information extraction system for research papers by a revised semantic annotation method based on CRFM and the semantic dictionary.Our system can analyze the information extraction problem from two levels,i.e.from the sentence level and noun(phrase)level of research papers.Compared with the extraction method based on knowledge engineering and that on machine learning,our system shows advantages of the both.
出处 《Chinese Journal of Library and Information Science》 2013年第4期16-27,共12页 中国文献情报(英文版)
基金 supported by the National Social Science Foundation of China(Grant No.12CTQ032)
关键词 Research papers Information extraction Semantic labeling Regular expression Conditional random fields Research level Research papers Information extraction Semantic labeling Regular expression Conditional random fields Research level
  • 相关文献

参考文献12

  • 1Adrian Ghemes,Yoshitaka Minami,Junichi Muramatsu,Morihiro Okada,Hidenori Mimura,Yoku Inoue.Fabrication and mechanical properties of carbon nanotube yarns spun from ultra-long multi-walled carbon nanotube arrays[J].Carbon.2012(12) 被引量:1
  • 2Dong C. Liu,Jorge Nocedal.On the limited memory BFGS method for large scale optimization[J].Mathematical Programming (-).1989(1-3) 被引量:1
  • 3Appelt,D.E,Onyshkevych,B.The common pattern specification language[].Association for Computational Linguistics TIPSTER’’ Proceedings of a Workshop.1998 被引量:1
  • 4Peng,F,McCallum,A.Accurate information extraction from research papers using conditional random fields. http://acl.ldc.upenn.edu/N/N04/N04-1042.pdf . 2013 被引量:1
  • 5http://crfpp.googlecode.com/svn/trunk/doc/index.html?source=navbar . 被引量:1
  • 6Che,W.X.Kernel-based semantic role labeling. http://d.wanfangdata.com.cn/Thesis_D257095.aspx . 2008 被引量:1
  • 7http://flexcrfs.sourceforge.net . 被引量:1
  • 8Grishman,R.Information extraction:Techniques and challenges. http://link.springer.com/chapter/10.1007%2F3-540-63438-X_2 . 1997 被引量:1
  • 9http://mallet.cs.umass.edu/ . 被引量:1
  • 10Humphreys,K,Demetriou,G,Gaizauskas,R.Two applications of information extraction to biological science journal articles:Enzyme interactions and protein structures. http://psb.stanford.edu/psb-online/proceedings/psb00/humphreys.pdf . 2013 被引量:1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部