期刊文献+

一种基于位置信息的高效DNA序列挖掘算法 被引量:1

AN EFFICIENT POSITION-BASED DNA SEQUENCE MINING ALGORITHM
下载PDF
导出
摘要 类Apriori算法在产生频繁模式时需要多次扫描数据库,并且产生大量的候选集;Free Span和Prefix Span等基于投影数据库的算法在产生频繁模式时会产生大量的投影数据库,占用很多内存空间,这些都造成了很大的冗余。针对以往序列挖掘算法存在的不足,提出一种高效的序列挖掘算法——基于位置信息的序列挖掘算法PBSMA(Position-Based Sequence Mining Algorithm)。PBSMA算法通过记录频繁子序列的位置信息来减少对数据库的扫描,利用位置信息逐渐扩大频繁模式的长度,并且借鉴关联矩阵的思想和Prefix Span算法中前缀的概念,深度优先去寻找更长的关键模式。实验结果证明,无论在时间还是空间上,PBSMA算法都比Prefix Span算法更高效。 Similar to Apriori algorithm in generating frequent patterns need to scan the database several times, and generate a large number of candidate sets. Algorithms based on the projection database, such as FreeSpan and PrefixSpan , in generating frequent patterns will produce a large number of projection database, taking up a lot of memory space, which have caused a lot of redundancy. Aiming at the shortcomings of the previous sequence mining algorithms, an efficient sequence mining algorithm named PBSMA is proposed in this paper. The PBSMA reduces the scanning of the database by recording the position information of frequent subsequences, and gradually enlarges the length of the frequent patterns by using the position information. The algorithm uses the idea of association matrix and the concept of prefix in PrefixSpan algorithm to search for a longer key pattern. The experimental results show that the PBSMA is more efficient than PrefixSpan algorithm both in time and space.
出处 《计算机应用与软件》 2017年第6期230-235,308,共7页 Computer Applications and Software
基金 国家自然科学基金项目(61273293)
关键词 序列挖掘 DNA序列 位置信息 关联矩阵 前缀 Sequence mining DNA sequence Position information Association matrix Prefix
  • 相关文献

参考文献13

二级参考文献101

共引文献106

同被引文献7

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部