期刊文献+

基于规则和数据学习的清洗模型研究

Research of Data Cleaning Model Based on Rule and Data Learning
下载PDF
导出
摘要 数据质量在信息管理系统中具有重要意义。然而,由于用户拼写、录入、系统升级等原因导致各种数据质量问题的出现。数据清洗的目的就是检测出脏数据并修复它们。而当前的清洗工具缺乏灵活性和扩展性,基于此,本文提出了一个基于规则和数据学习的通用清洗模型。模型实现了动态规则学习和动态数据学习等关键技术。通过规则匹配和反馈学习过程实现了动态清洗规则最佳选择;通过字段学习和元表学习过程实现了动态数据的初始化。实验证明,应用该模型保证了动态数据的质量,提高了当前清洗工具的灵活性和扩展性。 Data quality is quite significant for management information systems.However,various data quality problems emerge due to the user spelling,recording and system upgrades.The purpose of data cleaning is to effectively detect the dirty data and repair them.And on account of the limited extensibility and flexibility of current data cleaning tools,this paper proposes a universal data cleaning modeling based on rule learning and data learning.It implements the key technologies of the modeling,such as dynamic rule learning and dynamic data learning in detail.By the learning process of rule matching and rule feedback,the model realizes the optimal cleaning rule selection.By the learning process of field and metatable,the model achieves initializing of dynamic data information.Experiments show that the application of the model ensures the quality of dynamic data,and improves the flexibility and expansibility of the cleaning tools.
作者 石少敏
出处 《陕西教育学院学报》 2011年第3期89-93,共5页 Journal of Shaanxi Institute of Education
基金 陕西教育学院科研基金项目(10KJ040)
关键词 数据清洗 清洗规则 规则反馈 数据质量 数据学习 规则学习 data cleaning cleaning rule rule feedback data quality data learning rule learning
  • 相关文献

参考文献4

  • 1Wang Hong, Huan Xiuxia, Wang Hongwei, et al. Research and Implementation of QAR Data Warehouse[ G3. Second Interna- tional Symposium on Intelligent Information Technology Application, 2008. 被引量:1
  • 2REHMAN M, ESICHAIKUL V. Duplicate Record Detection for Database Cleansing[ G]. Second International Conference on Machine Vision ,2009. 被引量:1
  • 3Yan Hao, Diao Xing-chun, LI C,e. Design and Implementation of the Uncertain Resource Objects in the Network Resource Man- agement[ G]. International Seminar on Future and Information Technology and Management Engineering, 2008. 被引量:1
  • 4Cao Jianjun, Diao Xingehun, Wang Ting. Some Innovative Viewpoints for Improving Data Quality[ G]. The 8th International Symposium on Test and Measurement, 2009. 被引量:1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部