期刊文献+

基于位运算的不完整记录分类检测方法 被引量:3

Classification detection method for uncompleted records based on bit operation
下载PDF
导出
摘要 缺失数据的处理是数据清洗的重要内容。提出了一种基于位运算的不完整记录分类检测方法。对不完整记录进行了界定,将记录分为完整、不完整合格、不完整修正和不完整删除四类,并给出了其层次分类流程。定义了记录的二进制表示,根据不完整记录样本生成各类记录的标准二进制表示集,按在样本中出现的次数确定标准二进制表示的优先级,并对不完整删除标准二进制表示集中的二制表示进行了表达式合并。通过位运算实现记录的分类检测,并通过处理未检出二进制表示逐步完善二进制表示集。根据不完整记录二进制表示确定记录的进一步处理。应用实例验证了方法的有效性。 Missing data treatment is an important content of data cleaning.A classification detection method for uncompleted records is proposed.The uncompleted record is defined and records are classified as four classes,including completed records,uncompleted and unmodifying records,uncompleted and modifying records,uncompleted and deleting records.A classifying flow with hiberarchy is given.The binary expression of a record is defined.The standard binary expression sets of each class are created according to uncompleted record samples.Priority of standard binary expressions is determined by appearance times in samples.Some specific binary expressions are merged using formulas.Classification detection of records is implemented by bit operation.Binary expression sets are perfected step by step through dealing unseen binary expressions.The next processing of uncompleted records could be confirmed by their binary expressions.The effectiveness of the proposed method is validated by an instance.
出处 《系统工程与电子技术》 EI CSCD 北大核心 2010年第11期2489-2492,共4页 Systems Engineering and Electronics
基金 中国博士后科学基金(20090461425) 江苏省博士后科研资助计划项目(0901014B)资助课题
关键词 数据质量 数据清洗 缺失数据 不完整记录 分类 data quality data cleaning missing data incompleted record classification
  • 相关文献

参考文献12

  • 1李德毅,杜鹢著..不确定性人工智能[M].北京:国防工业出版社,2005:411.
  • 2Li X B.A Bayesian approach for estimating and replacing missing categorical data[J].ACM Journal of Data and Information Quality,2009,1(1):1-11. 被引量:1
  • 3Michie D,Spiegelhalter D J,Taylor C C.Machine learning,neural,and statistical classification[M].New York:Prentice Hall,1994. 被引量:1
  • 4SAS Institute.SAS procedure guide[R].SAS Institute Inc.,Cary,NC.Inc,1990. 被引量:1
  • 5Breiman L,Friedman J H,Olshen R A,et al.Classification and regression trees[M].Belmont:Wadsworth International Group,1984:203-215. 被引量:1
  • 6Quinlan J R.C4.5:programs for machine learning[M].San Mateo:Morgan Kaufmann of Elsevier,1993. 被引量:1
  • 7曹建军,刁兴春,汪挺,王芳潇.领域无关数据清洗研究综述[J].计算机科学,2010,37(5):26-29. 被引量:27
  • 8Witten I H,Frank E.Data mining:practical machine learning tools and techniques[M].San Francisco:Morgan Kaufmann of Elsevier,2005. 被引量:1
  • 9Chen G,Astebro T.How to deal with missing categorical data:test of a simple Bayesian method[J].Organizational Research Methods,2003,6(3):309-327. 被引量:1
  • 10陈伟,丁秋林.数据清理中不完整数据的清理方法[J].微型机与应用,2005,24(2):44-45. 被引量:7

二级参考文献44

共引文献32

同被引文献28

引证文献3

二级引证文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部