摘要
利用生物信息学与实验验证的技术路线 ,成功地克隆了人类新基因C17orf32的cDNA (GenBank登记号 :AY0 74 90 7和TPA :BK0 0 0 2 6 0 ) ,发现C17orf32的完整开放阅读框架 (ORF ,31~ 6 5 7bp)cDNA (6 2 7bp)与人类假定基因LOC12 4 919ORF (2 5~ 80 7bp)的 2 5~ 6 5 1位只有一个碱基不同 .经RT PCR验证并cDNA测序、人类表达序列标签 (EST)数据库的BLAST检索和基因组成规律分析三方面的结果 ,均支持C17orf32的序列 ,而不支持LOC12 4 919的编码序列 .C17orf32基因组序列全长 4 6 10kb ,含有 6个外显子和 5个内含子 ,cDNA序列全长 16 79bp ,ORF横跨全部 6个外显子 .该基因ORF翻译起始处符合Kozak规则 ,ORF起始码上游同一相位有终止码 ,ORF后有 2个加尾信号和PolyA尾 .C17orf32基因的成功克隆表明 ,NCBIGENOMEAnnotationProject在 2 0 0 1年 12月预测的人类假定蛋白XP- 0 5 886 5编码基因LOC12 4 919的模式参考序列XM- 0 5 886 5中存在偏差 ,即在C17orf32基因cDNA的 4 0 6与 4 0 7位碱基之间错误插入一个碱基G ,从而导致在插入位点后 ,ORF编码 12 5位氨基酸以后蛋白质序列的改变 ,出现 2 6 0个氨基酸的多肽 .因此 ,应慎重看待计算机注释的人类基因组编码序列 .
A novel human gene encoding a protein of 208 amino acids is identified and characterized, which has been offered by HGNC with symbol of C17orf32 and name of chromosome 17 open reading frame 32. The full length cDNA of 1 679 bp for C17orf32 was cloned through a blast search of public databases following the identification of 1 119 bp cDNA obtained by EST assembly with full robotization of SiClone software (created by Chen RS and Ling LJ, and will be released on their website) in ShenWei Ⅳ type supercomputer. Structurally, C17orf32 has one calcitonin / CGRP / IAPP family signature from amino acid 16 to 169, one dihydroorotase signature from amino acid 43 to 117, one tyrosine kinase phosphorylation site from amino acid 68 to 75, and one bipartite nuclear localization signal from amino acid 28 to 45. These motifs imply the potential biological importance of this gene. Genomic organization analyses show that C17orf32 gene is comprised of six exons, in the size ranging from 43 to 1 101 bp , and five introns, in the size ranging from 163 to 1 124 bp , and spanning 4 61 kb. All of the exon/intron boundaries are consistent with the GT/AG rule, and consensuses surrounding the splice boundaries are found as well. The C17orf32 gene is located on accession NT-010808 7 in the human chromosome 17, and is only linked with LOC124919, a hypothetical human gene of 889 bp mRNA encoding hypothetical protein XP-058865 of 260 amino acids supported by XM-058865. The sequence of LOC124919 has not been verified experimentally. Furthermore, the full length ORF of 627 bp cDNA from 31 to 654 bp by RT PCR from the single stranded human gastric adenocarcinoma MGC803 cell line are cloned and sequenced, which is fully identical with that of the in silico cloning determined by the nucleotide sequencing. Thus, in silico cloning of C17orf31 gene with GenBank accession number of AY074907 and TPA: BK000260 is identified solely by bioinformatics analyses. The full length cDNA sequence of 1 679 bp exhibits very good overall
出处
《生物化学与生物物理进展》
SCIE
CAS
CSCD
北大核心
2002年第4期543-549,共7页
Progress In Biochemistry and Biophysics
基金
中国博士后科学基金资助项目 (2 92 0 0 112 1760 80 62 0 0 0 ) ~~