摘要
本文研究了基于综述型文献的跨学科领域信息源信息抽取、聚合与可视化方法,结合实例从高质量的综述型文献中抽取信息源信息,构建了"机器学习"跨学科领域信息源地图。首先,在对搜集文献预处理的基础上,识别篇章关系,构建辅助词典;使用词典及辅助词典进行分词、完成实体识别和词性标注等工作;结合一定的语法逻辑规则,实现句子级的抽取和信息源数据抽取;对多篇文献中相同信息源信息进行整合;最后,利用D3工具绘制信息源地图,从时间和来源两个维度对跨学科领域高价值信息源进行导航。实验证明上述方法取得了较好的效果,信息抽取的查全率和查准率较高,有助于跨学科领域科研人员快速了解领域发展过程、高价值信息源、涉及学科及不同学科的研究方向特征。
This study investigates the methods of extracting, aggregating and visualizing interdisciplinary information sources based on literature reviews, and the information source map in the interdisciplinary domain of "machine learning" has been built by taking the information-source information from high-quality literature. Firstly, text preprocessing was carried out on the collected literatures; secondly, the auxiliary dictionary has been constructed on the basis of recognizing the context relationship, the word segmentation has been finished by using dictionaries and auxiliary dictionaries, and the entity recognition and part of speech tagging have been completed. According to the result of entity recognition and rules of syntax logic, sentence extraction and information source data extraction have been achieved; Moreover, information from the same information source in multiple articles has been integrated. Finally, the map of information source has been accomplished by using D3 tool from the dimensions of time and source. Experiments indicate that this method could lead to good results with high recall and precision ratio, which is helpful for interdisciplinary researchers to quickly master the developing process of the field, the high value information source and the characteristics of research of direction different disciplines.
出处
《图书情报知识》
CSSCI
北大核心
2018年第6期61-74,共14页
Documentation,Information & Knowledge
基金
国家社会科学基金项目"基于信息视域的跨学科协同信息行为与特征研究"(14BTQ068)的成果之一
关键词
跨学科
信息源地图
综述型文献
信息抽取
实体识别
可视化
Interdisciplinary
Information source map
Literature reviews
Information extraction
Entity identification
Visualization