期刊文献+

蛋白质序列图形变换及其相似性聚类分析 被引量:1

Graphical Transformation and Similarity Clustering Analysis for Protein Sequences
下载PDF
导出
摘要 基于氨基酸的疏水性和相对分子质量,先把20种氨基酸分为8类,按不同间隔角度放置于圆周上。根据z轴坐标的划分,建立一个坐标空间。将蛋白质序列中的氨基酸按排列顺序映射到空间坐标系中,得到序列的3D模型。将3D模型转换为20维矩阵图,分析序列中氨基酸对数量特征及相似性。进一步将空间坐标转换为数值序列,进行离散傅里叶变换(discrete Fourier transform,DFT),得到原蛋白质序列的功率谱,将不同长度的功率谱扩展到数据集中最长序列的长度m维。再通过计算功率谱序列间的欧氏距离来度量序列相似性,构建系统发育树。最后对不同数据集进行验证,结果显示:聚类结果与矩阵图的分析相符,且优于其他算法的效果,表明此算法对蛋白质相似性研究具有一定的有效性。 Based on the hydrophobicity and relative molecular mass, twenty amino acids were divided into 8 classes and placed on the circumference at different intervals. According to the division by z-axis coordinates, a coordinate space was established and each amino acid corresponded to a point. The amino acids were connected according to the order of amino acids in a certain protein sequence to get the 3D model of the sequence. The 3D model was converted into a 20-dimensional matrix diagram to analyze the number of amino acid pairs in the sequence and the similarity of sequences. The spatial coordinates were further converted into numerical sequences. Discrete Fourier transform(DFT) was performed on the numerical sequences to obtain the power spectrum of the original protein sequence. Then, the power spectrum of different lengths was evenly scaled to the longest length m among the compared sequences. The Euclidean distance of the new power spectral sequences was employed as a measurement of the similarities. At last, the method was tested in different datasets and the clustering results were consistent with the analysis of matrix diagrams.The comparison with other algorithms' results showed that the method was effective and reasonable.
作者 潘以红 钱东 朱平 PAN Yi-hong;QIAN Dong;ZHU Ping(School of Science, Jiangnan University, Wuxi 214122, Jiangsu, China)
机构地区 江南大学理学院
出处 《生命科学研究》 CAS CSCD 2018年第3期191-200,228,共11页 Life Science Research
基金 国家自然科学基金资助项目(11271163)
关键词 蛋白质序列图形化 蛋白质序列数值化 矩阵图差异分析 离散傅里叶变换(DFT) 系统发育树 graphical transformation of protein sequence numerical representations of protein sequence thedifference analysis of matrix diagrams discrete Fourier transform (Db-T) phylogenetic tree
  • 相关文献

参考文献1

二级参考文献3

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部