新冠肺炎(COVID-19)在全球范围爆发,至今仍未得到有效控制。新冠病毒(SARS-CoV-2)表面的刺突蛋白(spike protein, S)在病毒传播中起着十分重要的作用,针对它的分析在疾病预防与免疫中具有重要的应用价值。本文分析了新型冠状病毒基因序列的碱基分布及S蛋白基因的突变情况。针对相关新冠病毒基因序列进行多种可视化处理及分析,选择多条S蛋白基因序列,运用BLAST以及MEGA6软件进行信息比对、对齐,再进行信息熵的计算、展示可视化分布及相关分析。结果显示,新冠病毒基因碱基的整体分布具有对称性,由于选择的S蛋白数量不大变异量较小,其信息熵可视化分布呈现的特征聚点数目也较少。
At present, the COVID-19 is breaking out on a global scale, and it has not been effectively controlled. Because the surface spike protein of SARS-CoV-2 genomes plays an important role in the spread of the virus, it provides valuable information for fighting COVID-19 and vaccine practices. This paper analyzed the base distribution of SARS-CoV-2 genomes and the mutation of S protein gene. It made visualization to analyze the relevant gene sequences. Multiple S protein gene sequences are select-ed, then the BLAST and MEGA6 are applied to compare and align them. Then S proteins are calcu-lated their information entropy, and made visualization of their entropy distributions. The visual results show that the base distributions of SARS-CoV-2 genomes have symmetrical properties. Due to smaller number of S proteins selected, there are only a limited number of clustering on their dis-tributions of information entropy.
Hans Journal of Computational Biology