期刊文献+

关于实体解析基本方法的研究和述评 被引量:3

Reviewing Basic Methods of Entity Resolution
原文传递
导出
摘要 【目的】探讨实体解析理论中经典的实体解析方法及逻辑思路。【文献范围】在GoogleScholar和CNKI中分别以检索词"Entity Resolution"、"Collective Analysis"、"Crowdsourced"、"Active Learning"、"Privacy-Preserving"和"实体解析"进行文献检索,再结合主题筛选,精读并使用追溯法获得实体解析研究的代表性文献共86篇。【方法】针对每种实体解析方法,归纳分析该方法的基本思想,并通过图示直观地呈现其中的解析过程;重点分析梳理方法实现过程中,现有研究所采用的关键策略、算法或技术等。【结果】实体解析是数据质量管理的基本操作,也是发现数据价值的关键步骤。【局限】未深入分析各实体解析方法的评价指标和应用情况。【结论】尽管现有实体解析方法能在一定程度上满足大部分应用的需求,但在大数据环境下其仍然面临着数据混杂性、隐私保护和分布式环境等方面的挑战。 [Objective] This paper discusses the classical entity resolution methods and logical thinking in entity resolution theory.[Coverage] Google Scholar and CNKI were respectively used to search literatures with the keywords"Entity Resolution","Collective Analysis","Crowdsourced","Active Learning","Privacy-Preserving" and "Entity Resolution" in Chinese. I then obtained a total of 86 representative literatures in conjunction with topic screening,intensive reading and retrospective method.[Methods] For each entity resolution method, the paper first summarizes and analyzes the basic idea of the method, and presents the resolution process through illustration, and then focuses on analyzing the key strategies, algorithms or techniques adopted by the existing research in the process of implementation of the method.[Results] Entity resolution is the basic operation of data quality management, and the key step to find the value of data.[Limitations] There is no in-depth analysis of the evaluation indicators and application of each entity resolution method.[Conclusions] Although existing entity resolution methods can meet the requirements of most applications to some extent, they still face challenges in data heterogeneity, privacy protection and distributed environment in the big data environment.
作者 高广尚 Gao Guangshang(Business School,Guilin University of Technology,Guilin 541004,China)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2019年第5期27-40,共14页 Data Analysis and Knowledge Discovery
基金 国家自然科学基金项目“面向数据演化的增量实体解析方法研究”(项目编号:71761008) 广西高校人文社会科学重点研究基地基金项目“面向企业数据治理的数据质量改善研究”(项目编号:16YB010)的研究成果之一
关键词 实体解析 协同分析 众包 主动学习 隐私保护 Entity Resolution Collective Analysis Crowdsourced Active Learning Privacy-Preserving
  • 相关文献

参考文献9

二级参考文献267

  • 1龙军,殷建平,祝恩,赵文涛.主动学习研究综述[J].计算机研究与发展,2008,45(z1):300-304. 被引量:31
  • 2刘永楠,邹兆年,李建中,王海洁.数据完整性的评估方法[J].计算机研究与发展,2013,50(S1):230-238. 被引量:11
  • 3韩冰,高新波,姬红兵.一种基于选择性集成SVM的新闻音频自动分类方法[J].模式识别与人工智能,2006,19(5):634-639. 被引量:5
  • 4赵悦,穆志纯.基于QBC的主动学习研究及其应用[J].计算机工程,2006,32(24):23-25. 被引量:5
  • 5Sweeney L. Kvanonymity , A model for protecting privacy[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2002, 10(5), 557-570. 被引量:1
  • 6Montjoye D, Hidalgo C A, Verleysen M, et al. Unique in the crowd, The privacy bounds of human mobility[J]. Nature, Scientific Reports, 2013, 3(2), 1-5. 被引量:1
  • 7Sweeney L, Abu A, Winn J. Identifying participants in the personal genome project by name[R/OL]. Cambridge, MA, Harvard University Data Privacy Lab.[2013-04-24]. http,// datapri vacyla b. oig] projects/ pgp/ 1 0 21-1. pdf. 被引量:1
  • 8Weitzner D 1, Bruce E J. Big data privacy workshop, Advancing the state of the art in technology and practice[R].[2014-03-03]. http,//web. mit. edu/bigdata-priv/index. html. 被引量:1
  • 9Holdren J P, Lander E S. Big data privacy, A technological perspective[R/OL].[2014-05-01]. http,//www. whi tehouse. gov/ sites/ default/ files/ microsites/ ostp/PCAST / pcast_big_data_and_privacy _-_may _2014. pdf. 被引量:1
  • 10工业和信息化部电信研究院.大数据白皮书[R/OL].[2014-07].工业和信息化部电信研究院,2014. 被引量:1

共引文献553

同被引文献6

引证文献3

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部