期刊文献+

TahcoRoll: fast genomic signature profiling viathinned automaton and rolling hash 被引量:1

原文传递
导出
摘要 Objectives:Genomic signatures like k-mers have become one of the most prominent approaches to describe genomic data.As a result,myriad real-world applications,such as the construction of de Bruijn graphs in genome assembly,have been benefited by recognizing genomic signatures.In other words,an efficient approachof genomic signatureprofiling is an essential need for tackling high-throughput sequencing reads.However,most of the existing approaches only recognize fixed-size k-merswhile many research studies have shown the importance of considering variable-length k-mers.Methods:In this paper,we present a novel genomic signature profiling approach,TahcoRoll,by extending the Aho–Corasick algorithm(AC)for the task of profiling variable-length k-mers.We first group nucleotides into two clusters and represent each cluster with a bit.The rolling hash technique is further utilized to encode signatures and read patterns for efficient matching.Results:In extensive experiments,TahcoRoll significantly outperforms the most state-of-the-art k-mer counters and has the capability of processing reads across different sequencing platforms on a budget desktop computer.Conclusions:The single-thread version of TahcoRoll is as efficient as the eight-thread version of the state-of-the-art,JellyFish,while the eight-thread TahcoRoll outperforms the eight-thread JellyFish by at least four times.
出处 《Medical Review》 2021年第2期114-125,共12页 医学评论(英文)
基金 NSF DGE-1829071,NIHR35-HL135772,NIH/NIBIB R01-EB027650.
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部