流感病毒分为三类:甲型(A型),乙型(B型),丙型(C型).在这三种类型中甲型(A型)流感病毒是最致命的流感病毒,对人类引起了严重疾病.本文对甲型流感病毒DNA序列建立了一种新的时间序列模型,即CGR(Chaos Game Representation)弧度序列.利用CG...流感病毒分为三类:甲型(A型),乙型(B型),丙型(C型).在这三种类型中甲型(A型)流感病毒是最致命的流感病毒,对人类引起了严重疾病.本文对甲型流感病毒DNA序列建立了一种新的时间序列模型,即CGR(Chaos Game Representation)弧度序列.利用CGR坐标将甲流病毒DNA序列转换成CGR弧度序列,且引入长记忆ARFIMA模型去拟合此类序列,发现随机找来的10条H1N1序列,10条H3N2序列都具有长相关性且拟合很好,并且还发现这两种序列可以尝试用不同的ARFIMA模型去识别,其中H1N1可用ARFIMA(0,d,5)模型去识别,H3N2可用ARFIMA(1,d,1)模型去识别.展开更多
Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their ...Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.展开更多
文摘流感病毒分为三类:甲型(A型),乙型(B型),丙型(C型).在这三种类型中甲型(A型)流感病毒是最致命的流感病毒,对人类引起了严重疾病.本文对甲型流感病毒DNA序列建立了一种新的时间序列模型,即CGR(Chaos Game Representation)弧度序列.利用CGR坐标将甲流病毒DNA序列转换成CGR弧度序列,且引入长记忆ARFIMA模型去拟合此类序列,发现随机找来的10条H1N1序列,10条H3N2序列都具有长相关性且拟合很好,并且还发现这两种序列可以尝试用不同的ARFIMA模型去识别,其中H1N1可用ARFIMA(0,d,5)模型去识别,H3N2可用ARFIMA(1,d,1)模型去识别.
基金Project supported by the National Natural Science Foundation of China (Grant No 60575038)the Natural Science Foundation of Jiangnan University,China (Grant No 20070365)
文摘Chaos game representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to determine the coordinates of their positions in a continuous space. This distribution of positions has two features: one is unique, and the other is source sequence that can be recovered from the coordinates so that the distance between positions may serve as a measure of similarity between the corresponding sequences. A CGR-walk model is proposed based on CGR coordinates for the DNA sequences. The CGR coordinates are converted into a time series, and a long-memory ARFIMA (p, d, q) model, where ARFIMA stands for autoregressive fractionally integrated moving average, is introduced into the DNA sequence analysis. This model is applied to simulating real CGR-walk sequence data of ten genomic sequences. Remarkably long-range correlations are uncovered in the data, and the results from these models are reasonably fitted with those from the ARFIMA (p, d, q) model.