摘要
科技文献中回顾前人研究成果、分析存在的问题、提出解决方法等语言片段是论文创新性信息的构成部分。分析论文写作过程中问题分析信息的逻辑思维以及在文章中呈现的篇章关系,综合利用引用分布特征、篇章关系特征、否定情感特征构建具有普适性的信息抽取语义模式。从论文原始文本中通过匹配定义好的语义模式抽取出问题分析信息。同时,利用引导词特征、语义相似度计算从论文文本中抽取出论文的主要工作信息。以数据挖掘领域科技文献为例,对比人工抽取结果对提出的方法进行评价,结果表明该方法能较准确抽取相应信息,为科技论文聚类、论文推荐提供基础数据来源。
In the scientific and technology literature,the reviewof previous research results,analysis of existing problems,propose solutions and other language fragment are part of the innovation of this information. The logical thinking pattern of problem analysis information in the paper and the discourse relation were analyzed. A utilization of reference distribution,discourse relation characteristics,negative emotional characteristics was made to construct universal semantic pattern of information extraction. The problem analysis information was extracted from the original text by matching the defined semantic pattern. At the same time,the guide words feature and semantic similarity were used to extract the mainly work information from papers. Focusing on the science and technology literature of the data mining field,the proposed method was evaluated by contrasting with the artificial extraction results. The results showthat this method can accurately extract the corresponding information,provide the basic data source for clustering of scientific papers and the paper recommends.
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2015年第3期11-19,共9页
Journal of Shandong University(Natural Science)
基金
中国石油大学(北京)基金资助项目(KYJJ2012-05-25)
国家重大科技专项(2011ZX05023-005-006
2011ZX0520-007-007)
关键词
引用分布
篇章关系
语义模式
否定情感
引导词
reference distribution
textual relations
semantic pattern
negative emotion
guide words