As early as in 1975, Shamos and Hoey first gave an O(n lg n)-time divide-and-conquer algorithm (Stt algorithm in short) for the problem of finding the closest pair of points. In one process of combination, the Euc...As early as in 1975, Shamos and Hoey first gave an O(n lg n)-time divide-and-conquer algorithm (Stt algorithm in short) for the problem of finding the closest pair of points. In one process of combination, the Euclidean distances between 3n pairs of points need to be computed, so the overall complexity of computing distance is then 3n lgn. Since the computation of distance is more costly compared with other basic operation, how to improve SH algorithm from the aspect of complexity of computing distance is considered. In 1998, Zhou, Xiong and Zhu improved SH algorithm by reducing this complexity to 2n lg n. In this paper, we make further improvement. The overall complexity of computing distances is reduced to (3n lg n)/2, which is only half that of SH algorithm.展开更多
Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic ...Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic programming, is considered the most fundamental alignment algorithm in bioinformatics. However the existing parallel Smith-Waterman algorithm needs large memory space, and this disadvantage limits the size of a sequence to be handled. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel Smith- Waterman algorithm has become a critical problem. For solving this problem, we develop a new parallel bio-sequence alignment algorithm, using the strategy of divide and conquer, named PSW-DC algorithm. In our algorithm, first, we partition the query sequence into several subsequences and distribute them to every processor respectively, then compare each subsequence with the whole subject sequence in parallel, using the Smith-Waterman algorithm, and get an interim result, finally obtain the optimal alignment between the query sequence and subject sequence, through the special combination and extension method. Memory space required in our algorithm is reduced significantly in comparison with existing ones. We also develop a key technique of combination and extension, named the C&E method, to manipulate the interim results and obtain the final sequences alignment. We implement the new parallel bio-sequences alignment algorithm, the PSW-DC, in a cluster parallel system.展开更多
基金This work is supported by the National Natural Science Foundation of China (Grant No. 60496321) and Shanghai Science and Technology Development Fund (Grant No. 025115032).
文摘As early as in 1975, Shamos and Hoey first gave an O(n lg n)-time divide-and-conquer algorithm (Stt algorithm in short) for the problem of finding the closest pair of points. In one process of combination, the Euclidean distances between 3n pairs of points need to be computed, so the overall complexity of computing distance is then 3n lgn. Since the computation of distance is more costly compared with other basic operation, how to improve SH algorithm from the aspect of complexity of computing distance is considered. In 1998, Zhou, Xiong and Zhu improved SH algorithm by reducing this complexity to 2n lg n. In this paper, we make further improvement. The overall complexity of computing distances is reduced to (3n lg n)/2, which is only half that of SH algorithm.
文摘Tools for pair-wise bio-sequence alignment have for long played a central role in computation biology. Several algorithms for bio-sequence alignment have been developed. The Smith-Waterman algorithm, based on dynamic programming, is considered the most fundamental alignment algorithm in bioinformatics. However the existing parallel Smith-Waterman algorithm needs large memory space, and this disadvantage limits the size of a sequence to be handled. As the data of biological sequences expand rapidly, the memory requirement of the existing parallel Smith- Waterman algorithm has become a critical problem. For solving this problem, we develop a new parallel bio-sequence alignment algorithm, using the strategy of divide and conquer, named PSW-DC algorithm. In our algorithm, first, we partition the query sequence into several subsequences and distribute them to every processor respectively, then compare each subsequence with the whole subject sequence in parallel, using the Smith-Waterman algorithm, and get an interim result, finally obtain the optimal alignment between the query sequence and subject sequence, through the special combination and extension method. Memory space required in our algorithm is reduced significantly in comparison with existing ones. We also develop a key technique of combination and extension, named the C&E method, to manipulate the interim results and obtain the final sequences alignment. We implement the new parallel bio-sequences alignment algorithm, the PSW-DC, in a cluster parallel system.