期刊文献+

基于同义实体识别的Web信息集成 被引量:3

Web Information Integration Based on Synonymous Entities Recognition
下载PDF
导出
摘要 准确有效地集成海量Web信息,是Web信息动态聚合、市场情报分析、舆情分析、商业智能等分析型应用的重要基础.针对数据集成过程中不同实体指代同一实体的问题,利用搜索引擎返回的页面摘要信息,设计并实现了一种基于搜索引擎的同义实体识别算法FSE,并提出了一种基于同义实体识别的Web信息集成框架.在医院信息集成测试数据集上的实验结果表明,FSE算法效果优于基于Varient Dice、Varient Cosine、Varient Jaccard、Varient Overlap相似度计算的同义实体识别算法. Integrating massive information on the Web accurately and effectively is the important basis of developing analytic applications, such as Web information dynamic aggregation tools, market information analysis tools, public opinion analysis tools, and business intelligence tools, etc. To solve the problem that different presentations refer to the same entity during the integrating process, this paper proposes an algorithm to recognize the synonymous entities by using the snippets from the search engine and a frame of Web information integration based on synonymous entities recognition. The experimental results on hospital information integration testing data sets show that the proposed method outperforms the synonymous entities recognition based on Varient Dice, Varient Cosine, Varient Jaccard and Varient Overlap.
出处 《计算机系统应用》 2015年第9期35-42,共8页 Computer Systems & Applications
基金 国家高技术研究发展计划(863)(2012AA011005) 国家自然科学基金(61273297)
关键词 WEB信息集成 同义实体识别 相似度计算 搜索引擎 Web information integration synonymous entities recognition similarity computation search engine
  • 相关文献

参考文献20

  • 1Yan Z, Li Q, Zhang S, Peng Z, Dong Y, Ding Y, Zhang Y, Xu X. MI-WDIS: web data integration system for market intelligence. Proc. of the 19th ACM International Conference on Information and Knowledge Management. ACM. 2010. 1957-1958. 被引量:1
  • 2Christen P. A survey of indexing techniques for scalable record linkage and deduplication. IEEE Trans. on Knowledge and Data Engineering, 2012, 24(9): 1537-1555. 被引量:1
  • 3Draisbach U, Naumann F, Szott S, Wonneberg O. Adaptive windows for duplicate detection. Proc. of 28th International Conference on Data Engineering. IEEE. 2012. 1073-1083. 被引量:1
  • 4Elmagarmid AK, Ipeirotis PG, Verykios VS. Duplicate record detection: A survey. IEEE Trans. on Knowledge and Data Engineering, 2007, 19(1): 1-16. 被引量:1
  • 5Bhattacharya I, Getoor L. Collective entity resolution in relational data. ACM Trans. on Knowledge Discovery from Data (TKDD), 2007, 1(1): 5. 被引量:1
  • 6Christen P, Gayler R, Hawking D. Similarity-aware indexing for real-time entity resolution. Proc. of the 18th ACM Conference on Information and Knowledge Management. ACM. 2009. 1565-1568. 被引量:1
  • 7Singla P, Domingos P. Entity resolution with markov logic. Proc. of 6th Int. Conf. on Data Mining. IEEE. 2006. 572-582. 被引量:1
  • 8Christen P. A comparison of personal name matching: Techniques and practical issues. Proc. of 6th IEEE Int. Conf. on Data Mining. IEEE. 2006. 290-294. 被引量:1
  • 9Liu J, Lei KH, Liu JY, et al. Ranking-based name matching for author disambiguation in bibliographic data. Proc. of the 2013 KDD Cup 2013 Workshop. ACM. 2013. 8. 被引量:1
  • 10Jiang Y, Lin C, Meng W, Yu C, Cohen AM, Smalheiser NR. Rule-based deduplication of article records from bibliographic databases. Database, 2014, 2014: bat086. 被引量:1

同被引文献31

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部