摘要
针对多源异构知识图谱数据融合中的实体对齐问题,本文面向行业电商领域电商平台真实数据,提出了一种基于领域知识的集合相似度实体对齐算法。首先,基于领域知识针对性设计数据预处理技术,如实体属性值原子化、统一术语和去除冗余等,以规范化电商底层多源异构数据、提升数据处理效率和准确性;然后,以行业电商知识图谱应用为导向,筛选实体对生成高质量候选集,优化集合相似度测量和实体对排序方法,实现实体对的高效匹配。实验结果表明,本文算法可有效提高多源异构数据融合的准确率,大幅减少人工干预,可为行业电商发展提供新思路。
Aiming at the entity alignment problem in the fusion of multi-source heterogeneous knowledge graph data,this paper is oriented to the real data of the e-commerce platform in the industry e-commerce field,and proposes an entity alignment algorithm based on domain knowledge of the set similarity.First,data pre-processing techniques,such as atomizing property value,unifying terminology,and removing redundancy,are specifically designed based on domain knowledge to normalize the multi-source heterogeneous data at the bottom of e-commerce,thus improving the accuracy of data application.Then,considering the application of B2C e-commerce knowledge graph,an effective and efficient entity matching method is proposed,which mainly consists of selecting high-quality pairs of entities and sorting them by optimizing set similarity evaluation function.The experimental results show that the proposed algorithm can effectively improve the accuracy of data fusion,reduce workload,and can provide new ideas for the development of the industry.
作者
陈富强
肖明明
韩凯南
任毅
王文文
李克
CHEN Fuqiang;XIAO MingMing;HAN Kainan;REN Yi;WANG WenWen;LI Ke(Smart City College,Beijing Union University,Beijing 100101;China Railway Material Trade Group Co Ltd.,Beijing 102308;Luban(Beijing)Electronic Commerce Technology Co Ltd.,Beijing 102308)
出处
《高技术通讯》
CAS
2022年第12期1302-1311,共10页
Chinese High Technology Letters
基金
国家自然科学基金(61972040)
中铁物贸集团鲁班公司科技研究开发计划课题,北京市教育委员会科研计划(KM201911417010)
北京联合大学校内科研专项课题(ZB10202004)资助项目。
关键词
多源异构数据
知识图谱
实体对齐
集合相似度
电子商务
multi-source heterogeneous data
knowledge graph
entity alignment
set similarity
e-commerce