摘要
与传统存储方式相比,脱氧核糖核酸(DNA)存储的难点是测序序列中的插入和删除错误给信息解码过程带来了巨大挑战。针对具有1位纠错能力的前向纠错编码DNA存储,该文提出一种桶式分配策略提高解码的精度和效率。首先,搜索每个分组中所有测序读长的可识别DNA码,根据1位纠错能力确定其对应的合法编码;其次,根据每个可识别DNA码在测序读长的位置确定相应编码的最佳编码位置(即桶);最后,按照众数投票确定每个桶中的最终编码。仿真结果表明在0.10和0.05错误率条件下,平均解码准确率在20X测序深度时可达94%以上;在0.15错误率条件下,平均解码准确率在60X测序深度时可达90%以上。
Compared with traditional storage,the difficulty of DeoxyriboNucleic Acid(DNA)data storage is that insertion and deletion errors in sequenced reads pose a great challenge to data recovery.For forward errorcorrecting coded DNA storage with one-base error-correcting capability,a bucket allocation strategy is proposed to improve the decoding accuracy and efficiency.Firstly,all identifiable DNA codes of reads in each cluster are searched and the corresponding valid codes according to the one-base error-correcting capability are determined;Then,for each identifiable DNA code,appropriate coding position(i.e.bucket)according is allocated to its position in a read;Finally,the consensus code for each bucket is determined using majority voting strategy.Simulation results show that the proposed method can correct more than 94%errors at the coverage of 20X when error rate is 5%or 10%,and correct more than 90%errors at the coverage of 60X when error rate is 15%.
作者
昝乡镇
姚翔宇
许鹏
陈智华
石晓龙
李树栋
刘文斌
ZAN Xiangzhen;YAO Xiangyu;XU Peng;CHEN Zhihua;SHI Xiaolong;LI Shudong;LIU Wenbin(Institution of Computational Science and Technology,Guangzhou University,Guangzhou 510006,China;Cyberspace Institute of Advanced Technology,Guangzhou University,Guangzhou 510006,China)
出处
《电子与信息学报》
EI
CSCD
北大核心
2022年第10期3650-3656,共7页
Journal of Electronics & Information Technology
基金
国家自然科学基金(62072128,61876047,62002079)。
关键词
存储解码方法
脱氧核糖核酸存储
插入错误
删除错误
替换错误
Storage decoding method
DeoxyriboNucleic Acid(DNA)storage
Insertions error
Deletions error
Substitutions error