期刊文献+

学术文本的结构功能识别——功能框架及基于章节标题的识别 被引量:52

The Structure Function of Academic Text and Its Classification
下载PDF
导出
摘要 当前学术文本挖掘研究大多数是采用基于词汇、窗口、全文的方法,往往忽略了学术文本的内在结构,导致了很多歧义性问题。本文针对当前研究不足,提出一种研究性论文的结构功能框架,对学术文本的章节功能和逻辑结构进行了定义。在此基础上本文从三个不同层次(基于章节标题、基于章节内容和标题、基于段落)论述了结构功能的自动分类问题,并从第一个层次(基于章节标题)采用词表与序列标注相结合的方法进行了结构功能的自动分类实验,取得了令人满意的效果。 The current academic text mining research is mostly based on the word, the window and the full text. It ignors the internal structure, leading to a lot of ambiguity problems. In view of the current lack of research, this paper puts forward a kind of framework that gives definition about the structure function of the research papers ' chapter. On this basis, from three different levels (based on the section headers, based on the section content and header, based on the paragraph) the automatic classification problem of structure function is discussed, and from the first level (based on the section header) by adopting the combination of vocabulary and sequence tagging method the automatic classification experiment of structure function is conducted, the satisfactory results have been achieved.
出处 《情报学报》 CSSCI 北大核心 2014年第9期979-985,共7页 Journal of the China Society for Scientific and Technical Information
基金 国家自然科学基金面上项目“基于语言模型的通用实体检索建模及框架实现研究”(项目编号:71173164) 教育部人文社会科学基地重大项目“面向细粒度的网络信息检索模型及框架构建研究”(项目编号:10JJD630014)的研究成果之一
关键词 文本挖掘 结构功能 自动分类 text mining, structure function, automatic classification
  • 相关文献

参考文献14

  • 1Qikai Cheng,Xiaoguang Wang,Wei Lu, et al. NEViewer: A New Software for Analyzing the Evolution of Research Topics [ J ]. Proceedings of the 14th International Conference of the International Society for Scientometrics and Informetrics. 2013: 1307-1320. 被引量:1
  • 2王晓光,程齐凯.基于NEViewer的学科主题演化可视化分析[J].情报学报,2013,32(9):900-911. 被引量:68
  • 3Xiaodan Zhu, Peter Turney, Daniel Lemire, et al. Measuring academic influence: Not a!l citations are equal [ J ]. Journal of the Association for Information Science and Technology, 2014 ,doi: 10. 1002/asi. 23179. 被引量:1
  • 4Carole Slade. Form and Style:Research Papers, Reports, Theses [ M ]. Houghton Mifflin Company, 1997. 被引量:1
  • 5Song Mao, Azriel Rosenfeld, Tapas Kanungo. Document structure analysis algorithms: a literature survey [ C ]. International Society for Optics and Photonics, 2003: 197-207. 被引量:1
  • 6Simone Marinai,Marco Gori,Giovanni Soda. Artificial neural networks for document analysis and recognition [ J ]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2005, 27 ( 1 ) : 23-35. 被引量:1
  • 7Koji Nakagawa,Akihiro Nomura,Masakazu Suzuki. Extraction of logical structure from articles in mathematics [ C ]. Springer, 2004 : 276-289. 被引量:1
  • 8Belai'd A,Rangoni Y. Structure extraction in printed documents using neural approaches [ M ]//Machine Learning in Document Analysis and Recognition Springer Berlin Heidelberg, 2008 : 21-43. 被引量:1
  • 9Luong M T, Nguyen T D, Kan M Y. Logical structure recovery in scholarly articles with rich document features [ J ]. International Journal of Digital Library Systems (IJDLS), 2010, 1(4): 1-23. 被引量:1
  • 10Hu Zhigang, Chen Chaomei ,Liu Zeyuan. Where are citations located in the body of scientific articles? A study of the distributions of citation locations [ J ]. Journal of Informetrics, 2013, 7(4) : 887-896. 被引量:1

二级参考文献34

  • 1Tan A H. Text mining: The state of the art and the challenges. Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases [ C ]. 1999 : 65-70. 被引量:1
  • 2Wang J, Xu C, Li G, et al. Understanding research field evolving and trend with dynamic Bayesian networks[ J]. Advances in Knowledge Discovery and Data Mining, 2007(4426) : 320-331. 被引量:1
  • 3Moerchen F, Fradkin D, Dejori M, et al. Emerging trend prediction in biomedical literature: AMIA Annual Symposium Proceedings[ C ]. 2008 : 485-489. 被引量:1
  • 4Schiebel E, Hfirlesberger M, Roche I, et al. An advanced diffusion model to identify emergent research issues: the case of optoelectronic devices [ J ]. Scientometrics ,2010, 83 (3) : 765-781. 被引量:1
  • 5Tu Y N, Seng J L. Indices of novelty for emerging topic detection [ J ]. Information Processing & Management, 2012, 48(2) : 303-325. 被引量:1
  • 6Leydesdorff L, Rafols I. A global map of science based on the ISI subject categories[ J]. Journal of the American Society for Information Science and Technology, 2008,60 (2) : 348-362. 被引量:1
  • 7Boyaek K W,Klavans R,B6mer K. Mapping the backbone of science [ J ]. Scientometrics, 2005,64 ( 3 ) : 351-374. 被引量:1
  • 8BSrner K, Chen C, Boyack K W. Visualizing knowledge domains [ J]. Annual review of information science and technology, 2005, 37( 1 ) : 179-255. 被引量:1
  • 9Chen C M. CiteSpace II:Detecting and visualizing emerging trends and transient patterns in scientific literature [ J]. Journal of the American Society for Information Science and Technology, 2005, 57 (3) : 359-377. 被引量:1
  • 10Mane K K,Bmer K. Mapping topics and topic bursts in PNAS [ J ]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101 ( Suppl 1 ) : 5287-5290. 被引量:1

共引文献67

同被引文献619

引证文献52

二级引证文献329

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部