期刊文献+

EnAli:entity alignment across multiple heterogeneous data sources 被引量:2

原文传递
导出
摘要 Entity alignment is the problem of identifying which entities in a data source refer to the same real-world entity in the others.Identifying entities across heterogeneous data sources is paramount to many research fields,such as data cleaning,data integration,.information retrieval and machine learning.The aligning process is not only overwhelmingly expensive for large data sources since it involves all tuples from two or more data sources,but also need to handle heterogeneous entity attributes.In this paper,we propose an unsupervised approach,called EnAli,to match entities across two or more heterogeneous data sources.EnAli employs a generative probabilistic model to incorporate the heterogeneous entity attributes via employing exponential family,handle missing values,and also utilize the locality sensitive hashing schema to reduce the candidate tuples and speed up the aligning process.EnAli is highly accurate and efficient even without any ground-truth tuples.We illustrate the performance of EnAli on re-identifying entities from the same data source,as well as aligning entities across three real data sources.Our experimental results manifest that our proposed approach outperforms the comparable baseline.
出处 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第1期157-169,共13页 中国计算机科学前沿(英文版)
基金 the National Key Research and Development Program of China (2016YFB1000905) the National Natural Science Foundation of China (Grant Nos.U1401256, 61402177,61672234,61402180 and 61232002) NSF of Shanghai (14ZR1412600).
  • 相关文献

同被引文献6

引证文献2

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部