期刊文献+

关系数据库中实体解析研究综述 被引量:1

Survey on Entity Resolution over Relational Databases
原文传递
导出
摘要 【目的】分析关系数据库中实体解析技术的研究现状和未来研究方向。【方法】从实体解析的精度和效率两方面展开系统研究。精度方面基于增量式、统计方法和相关信息;效率方面基于分块、字符串相似和其他方法。【结果】最大化实体解析精度和解析效率是实体解析技术研究的主要目标,但在数据源的动态演化、异构性和非精确字符串匹配等方面的研究仍面临重大挑战。【局限】仅从实体解析过程中的精度和效率方面进行探讨,对解析模型本身的特点和局限性关注不足。【结论】本研究有助于更全面了解关系数据库中实体解析的过程、研究现状和未来研究方向。 [Objective] To analyze the research status and future research direction of Entity Resolution (ER) over relational databases. [Methods] Systematical researches are made on the accuracy and efficiency aspects of ER. The accuracy of ER is based on incremental methods, statistical methods and related information. The efficiency of ER is based on blocking, string similarity and other ideas. [Results] Maximizing precision and efficiency are the main goals of ER, but the research on dynamic evolution, heterogeneity of data sources and inexact string matching still faces significant challenges. [Limitations] Only precision and efficiency in the process of ER are discussed, but the characteristics and limitations of ER model don't get the same level of attentions. [Conclusions] This paper gives a comprehensive overview of the process of ER over relational databases, research status and future research direction.
出处 《现代图书情报技术》 CSSCI 2015年第7期37-47,共11页 New Technology of Library and Information Service
基金 国家"十二五"科技支撑计划课题"科技知识组织体系共享平台建设"(项目编号:2011BAH10B03)的研究成果之一
关键词 实体解析 记录链接 关系数据库 Entity resolution Record linkage Relation databases
  • 相关文献

参考文献58

  • 1Newcombe H B, Kennedy J M, Axford S J, et al. Automatic Linkage of Vital Records [J]. Science, 1959, 130(3381): 954-959. 被引量:1
  • 2Fellegi I P, Sunter A B. A Theory for Record Linkage [J]. Journal of the American Statistical Association, 1969, 64(328): 1183-1210. 被引量:1
  • 3Newcombe H B, Kennedy J M. Record Linkage: Making Maximum Use of the Discriminating Power of Identifying Information [J]. Communications of the ACM, 1962, 5(11): 563-566. 被引量:1
  • 4Hernandez M A, Stolfo S J. The Merge/Purge Problem for Large Databases[C]. In: Proceedings of the 1995.ACM SIGMOD International Conference on Management of Data (SIGMOD'95), San Jose, California, USA. New York: ACM, 1995: 127-138. 被引量:1
  • 5Sarawagi S, Bhamidipaty A. Interactive Deduplication Using Active Learning [C]. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02), Edmonton, Alberta, Canada. New York: ACM, 2002: 269-278. 被引量:1
  • 6Dong X, Halevy A, Madhavan J. Reference Reconciliation in Complex Information Spaces [C].In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA. New York: ACM, 2005: 85-96. 被引量:1
  • 7Tejada S, Knoblock C A, Minton S. Learning Object Identification Rules for Information Integration [J]. Information Systems, 2001, 26(8): 607-633. 被引量:1
  • 8Christen P. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection [M]. Springer Berlin Heidelberg, 2012. 被引量:1
  • 9Elmagarmid A K, Ipeirotis P G, Verykios V S. Duplicate Record Detection: A Survey [J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(1): 1-16. 被引量:1
  • 10Winkler W E. Overview of Record Linkage and Current Research Directions [R]. Washington, D C: U.S. Census Brueau, 2006. 被引量:1

二级参考文献167

  • 1孙舒杨,刘大有,孙成敏,黄冠利.统计关系学习模型Markov逻辑网综述[J].计算机应用研究,2007,24(2):1-3. 被引量:7
  • 2[1]Bitton D, DeWitt D J. Duplicate record elimination in large data files. ACM Trans Database Systems, 1983, 8(2):255-65 被引量:1
  • 3[2]Hernandez M, Stolfo S. The Merge/Purge problem for large databases. In: Proc ACM SIGMOD International Conference on Management of Data, 1995. 127-138 被引量:1
  • 4[3]Howard B Newcombe, Kennedy J M, Axford S J, James A P. Automatic linkage of vital records. Science, 1959, 130:954-959 被引量:1
  • 5[4]DeWitt D J, Naught J F, Schneider D A. An evaluation of non-equijoin algorithms. In: Proc 17th International Conference on Very Large Databases, Barcelona, Spain, 1991. 443-452 被引量:1
  • 6[5]Hylton J A. Identifying and merging related bibliographic records[MS dissertation]. MIT: MIT Laboratory for Computer Science Technical Report 678, 1996 被引量:1
  • 7[6]Monge A E, Elkan C P. An efficient domain-independent algorithm for detecting approximately duplicate database records. In: Proc DMKD'97, Tucson Arizona, 1997 被引量:1
  • 8[7]Kukich K. Techniques for automatically correcting words in text. ACM Computing Surveys, 1992, 24(4):377-439 被引量:1
  • 9[8]Wagner R A, Fischer M J. The string-to-string correction problem. J ACM, 1974, 21(1):168-173 被引量:1
  • 10[9]Lowrance R, Robert A Wagner. An extension of the string-to-string correction problem. J ACM, 1975, 22(2):177-183 被引量:1

共引文献103

同被引文献4

引证文献1

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部