摘要
【目的】分析关系数据库中实体解析技术的研究现状和未来研究方向。【方法】从实体解析的精度和效率两方面展开系统研究。精度方面基于增量式、统计方法和相关信息;效率方面基于分块、字符串相似和其他方法。【结果】最大化实体解析精度和解析效率是实体解析技术研究的主要目标,但在数据源的动态演化、异构性和非精确字符串匹配等方面的研究仍面临重大挑战。【局限】仅从实体解析过程中的精度和效率方面进行探讨,对解析模型本身的特点和局限性关注不足。【结论】本研究有助于更全面了解关系数据库中实体解析的过程、研究现状和未来研究方向。
[Objective] To analyze the research status and future research direction of Entity Resolution (ER) over relational databases. [Methods] Systematical researches are made on the accuracy and efficiency aspects of ER. The accuracy of ER is based on incremental methods, statistical methods and related information. The efficiency of ER is based on blocking, string similarity and other ideas. [Results] Maximizing precision and efficiency are the main goals of ER, but the research on dynamic evolution, heterogeneity of data sources and inexact string matching still faces significant challenges. [Limitations] Only precision and efficiency in the process of ER are discussed, but the characteristics and limitations of ER model don't get the same level of attentions. [Conclusions] This paper gives a comprehensive overview of the process of ER over relational databases, research status and future research direction.
出处
《现代图书情报技术》
CSSCI
2015年第7期37-47,共11页
New Technology of Library and Information Service
基金
国家"十二五"科技支撑计划课题"科技知识组织体系共享平台建设"(项目编号:2011BAH10B03)的研究成果之一
关键词
实体解析
记录链接
关系数据库
Entity resolution
Record linkage
Relation databases