期刊文献+

一种客户关系数据库相似重复记录清洗算法 被引量:3

A Cleaning Algorithm for Approximately Duplicated Records in Customer Relationship Database
下载PDF
导出
摘要 客户关系数据库中拥有大量的客户记录,其中许多记录构成相似重复记录,检测、清洗进而合并相似重复记录可以提高存储空间的利用率,还可以加快记录查询的速度.在研究客户记录的基础上,提出一种客户关系数据库相似重复记录清洗算法,算法首先对记录进行排序,设定属性权重和记录相似度闸值,通过计算相邻记录的相似度判定记录是否相似重复,最后对检测到的相似重复记录进行清洗与合并. Customer relationship database has a large number of customer records, many of which constitute approximately duplicated records. Detecting, cleaning and then merging approximately duplicated records can improve storage utilization, and can also improve the speed of searching records. Based on the research of customer records, an algorithm which is used to clean approximately duplicated records in customer relationship database is proposed. In this algorithm, first, records are sorted;the property weight and records similarity values are set. Then by calculating the similarity between adjacent records, approximate or duplicate records are judged. Finally the detected approximately duplicated records are cleaned and merged.
作者 郭文龙
出处 《衡水学院学报》 2014年第1期15-17,共3页 Journal of Hengshui University
基金 福建省教育厅A类科技项目(JA12335)
关键词 客户关系 相似重复记录 清洗 合并 customer relationship approximately duplicated records cleaning merge
  • 相关文献

参考文献4

二级参考文献42

共引文献105

同被引文献30

引证文献3

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部