摘要
针对传统方法在分析DNA序列相似性方面的不足,提出了一种新的基于信息量的DNA序列相似性分析算法,该方法将DNA序列视为基于符号集{A,C,G,T}的信号序列,全部待比较的DNA序列组合成一个以字符A、C、G、T为属性值的信息系统。在所得数据库系统中引进DNA序列的信息量、联合信息量、条件信息量、交互信息量等概念,讨论这些信息量的性质并给出它们之间的一些关系式,然后在此基础上构建DNA序列相似性分析模型。仿真实验结果表明,该方法不但能快速、有效地分析DNA序列相似性,而且较好地克服了DNA碱基数量很大且不同物种的DNA序列长短不同的不足。
Aiming at lacking in similarity analysis of DNA sequences using traditional methods, this paper proposed a novel similarity analysis of DNA sequences based on information quantity, and a DNA sequence was viewed as a signal sequence based on symbol set { A, C, G, T t , and then the DNA sequences could be viewed as a information system with attribute value A, C, G,T. It recommended the concepts of information quantity, joint information quantity, condition information quantity, mutual information quantity of DNA sequences in the database system, and discussed the properties about them, and then pro- vided some relation formulas, then built DNA sequences similarity analysis model based on this. The simulation results show that the method not only can effectively analysis of similarity of DNA sequences, but also overcome shortages for a large num- ber of DNA and DNA sequences of different species with different length.
出处
《计算机应用研究》
CSCD
北大核心
2013年第5期1381-1384,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(60603062
61100194)
湖南省重点建设学科资助
湖南省教育科学十二五规划项目(XJK011BXJ004)
湖南省教育厅科研资助项目(11C1184)
关键词
DNA序列比较
数据库系统
信息量
相似性
DNA sequence comparison
database system
information quantity
similarity