摘要
字典序次序依赖用于表达数据上属性列间的次序关系。现实数据往往具有很大的规模而且包含错误。研究针对字典序次序依赖的分布式数据修复技术,目标是将数据修改为满足给定次序依赖定义的形式。基于Spark平台,设计和实现分布式修复算法,同时通过实验验证该方法的有效性和运行效率。
Lexicographical order dependencies can define order specifications on lists of attributes.In practice,data are large and contain errors.This paper investigated the problem of distributed data repairing for lexicographical order dependencies,aiming at repairing data such that order dependencies defined on the data were satisfied.We designed and implemented distributed algorithms based on Spark framework,and conducted extensive experiments to verify the effectiveness and efficiency of our approach.
作者
郭乃网
覃晟
谈子敬
曹满亮
Guo Naiwang;Qin Sheng;Tan Zijing;Cao Manliang(State Grid Shanghai Municipal Electric Power Company,Shanghai 200437,China;Fudan University,Shanghai 200433,China)
出处
《计算机应用与软件》
北大核心
2023年第9期37-42,108,共7页
Computer Applications and Software
基金
科技部重点研发计划项目(2018YFB1402600)
上海市科委项目(19DZ2252800)
国网上海市科技项目(52094020001A)。
关键词
数据修复
字典序次序依赖
分布式计算
Data repairing
Lexicographical order dependency
Distributed computing