With the wide application of DNA sequencing technology, DNA sequences are still increasingly generated through the Sanger sequencing platform. SeqMan (in the LaserGene package) is an excellent program with an easy-t...With the wide application of DNA sequencing technology, DNA sequences are still increasingly generated through the Sanger sequencing platform. SeqMan (in the LaserGene package) is an excellent program with an easy-to-use graphical user interface (GUI) employed to assemble Sanger sequences into contigs. However, with increasing data size, larger sample sets and more sequenced loci make contig assemble complicated due to the considerable number of manual operations required to run SeqMan. Here, we present the 'autoSeqMan' software program, which can automatedly assemble contigs using SeqMan scripting language. There are two main modules available, namely, 'Classification' and 'Assembly'. Classification first undertakes preprocessing work, whereas Assembly generates a SeqMan script to consecutively assemble contigs for the classified files. Through comparison with manual operation, we showed that autoSeqMan saved substantial time in the preprocessing and assembly of Sanger sequences. We hope this tool will be useful for those with large sample sets to analyze, but with little programming experience. It is freely available at https://github.com/ Sun-Yanbo/autoSeqMan.展开更多
The problems of biological sequence analysis have great theoretical and practical value in modern bioinformatics.Numerous solving algorithms are used for these problems,and complex similarities and differences exist a...The problems of biological sequence analysis have great theoretical and practical value in modern bioinformatics.Numerous solving algorithms are used for these problems,and complex similarities and differences exist among these algorithms for the same problem,causing difficulty for researchers to select the appropriate one.To address this situation,combined with the formal partition-and-recur method,component technology,domain engineering,and generic programming,the paper presents a method for the development of a family of biological sequence analysis algorithms.It designs highly trustworthy reusable domain algorithm components and further assembles them to generate specifific biological sequence analysis algorithms.The experiment of the development of a dynamic programming based LCS algorithm family shows the proposed method enables the improvement of the reliability,understandability,and development efficiency of particular algorithms.展开更多
基金supported by the National Natural Science Foundation of China(31671326)the Youth Innovation Promotion Association,Chinese Academy of Sciences
文摘With the wide application of DNA sequencing technology, DNA sequences are still increasingly generated through the Sanger sequencing platform. SeqMan (in the LaserGene package) is an excellent program with an easy-to-use graphical user interface (GUI) employed to assemble Sanger sequences into contigs. However, with increasing data size, larger sample sets and more sequenced loci make contig assemble complicated due to the considerable number of manual operations required to run SeqMan. Here, we present the 'autoSeqMan' software program, which can automatedly assemble contigs using SeqMan scripting language. There are two main modules available, namely, 'Classification' and 'Assembly'. Classification first undertakes preprocessing work, whereas Assembly generates a SeqMan script to consecutively assemble contigs for the classified files. Through comparison with manual operation, we showed that autoSeqMan saved substantial time in the preprocessing and assembly of Sanger sequences. We hope this tool will be useful for those with large sample sets to analyze, but with little programming experience. It is freely available at https://github.com/ Sun-Yanbo/autoSeqMan.
基金supported by the National Natural Science Foundation of China(No.62062039)Natural Science Foundation of Jiangxi Province(Nos.20202BAB202024 and 20212BAB202017).
文摘The problems of biological sequence analysis have great theoretical and practical value in modern bioinformatics.Numerous solving algorithms are used for these problems,and complex similarities and differences exist among these algorithms for the same problem,causing difficulty for researchers to select the appropriate one.To address this situation,combined with the formal partition-and-recur method,component technology,domain engineering,and generic programming,the paper presents a method for the development of a family of biological sequence analysis algorithms.It designs highly trustworthy reusable domain algorithm components and further assembles them to generate specifific biological sequence analysis algorithms.The experiment of the development of a dynamic programming based LCS algorithm family shows the proposed method enables the improvement of the reliability,understandability,and development efficiency of particular algorithms.