摘要
为了对蛋白质序列进行更精确合理地相似性分析,本文将氨基酸的排列方式与其理化性质相结合,提出了一种基于自组织映射神经网络的聚类模型。首先,采用Wang和Wang的方法把蛋白质序列转化为一条5-字母序列,并将5个字母均匀分布在以原点为圆心的单位圆周上,得到蛋白质序列的位置坐标x,y。然后,结合氨基酸的3个理化指标,进而用一个5-维向量来表示一个氨基酸。最后,运用自组织映射神经网络对不同的蛋白质向量进行聚类分析。本文最后的数值试验部分对9个不同物种的线粒体NADH脱氢酸的蛋白质序列进行了相似性分析,实验结果在一定程度上验证了模型的有效性。
Combined the arrangement of amino acid with its physicochemical properties,we propose a new clustering model based on SOM neural network in the article,which is more accurate and reasonable to similarity analysis on protein sequences.First of all,the protein sequence is stransform into an 5-letter sequence using the method of Wang and Wang.The 5letters are uniformly distributed in the unit circle centered on the origin,and then we can get two position coordinates of protein sequences x,y.Next,combined with 3physicochemical indexes of amino acid,a 5-dimensional vector will be got to represent an amino aci.Finally,using SOM neural network to do cluster analysis of different protein vectors.At the end of this paper,numerical test is carry out to similarity analysis of mitochondrial NADH dehydrogenase from9 different protein sequences.And the experimental results verify validity of the model in a certain extent.
出处
《中国海洋大学学报(自然科学版)》
CAS
CSCD
北大核心
2016年第7期130-135,共6页
Periodical of Ocean University of China
基金
国家自然科学基金项目(61303145)
中央高校基本科研业务经费项目(201362031)资助~~