摘要
【目的】分析现有数据引用实践中的引用特征,归纳数据引用识别方法,总结数据引用识别的研究现状和未来发展方向。【方法】将现有数据引用识别方法归为基于规则识别、有监督机器学习算法和半监督机器学习算法三类,并对各方法的原理、特点、现存问题、性能效果、适用范围等方面进行概括分析。【结果】目前相关技术集中在有监督机器学习算法,结合数据引用行为识别和数据引用元素抽取的识别方法是未来的研究方向。【局限】主要从整体上归纳数据引用特征以及现有的数据引用识别算法,未深入阐述具体算法的技术细节。【结论】目前数据引用识别研究仍存在领域局限、方法单一、未充分考虑数据引用特征等问题,有待进一步优化。
[Objective]This paper analyzes the characteristics of the existing data citation practices and summarizes their recognition methods.It also explores current research and future development trends.[Methods]The existing data citation detection methods could be divided into three categories:rule-based recognition,supervised machine learning algorithm,and semi-supervised machine learning algorithm.We also reviewed each method’s principles,characteristics,existing problems,performance,and applications of each method.[Results]The existing technologies are concentrated on supervised machine learning algorithms.Detecting data citation with the help of citing behaviors and extracting data citation elements are the future direction.[Limitations]This paper summarizes the characteristics of data citations and existing recognition algorithms.It did not elaborate on the technical details of these algorithms.[Conclusions]There are still some problems in detecting data citation,such as research field limitations,lack of diversity in methods,and insufficient consideration of data citation characteristics,which need further optimization.
作者
周佳茵
钱庆
唐明坤
吴思竹
Zhou Jiayin;Qian Qing;Tang Mingkun;Wu Sizhu(Institute of Medical Information,Chinese Academy of Medical Sciences/Beijing Union Medical College,Beijing 100020,China)
出处
《数据分析与知识发现》
CSCD
北大核心
2023年第6期38-49,共12页
Data Analysis and Knowledge Discovery
基金
中国医学科学院医学与健康科技创新工程项目(项目编号:2021-I2M-1-057)的研究成果之一。
关键词
科学数据
数据引用
数据共享
引用识别
Scientific Data
Data Citation
Data Sharing
Citation Identification