Expressed sequence tags (ESTs), which have piled up considerably so far, provide a valuable resource for finding new genes, disease-relevant genes, and for recogniz- ing alternative splicing variants, SNP sites, etc. ...Expressed sequence tags (ESTs), which have piled up considerably so far, provide a valuable resource for finding new genes, disease-relevant genes, and for recogniz- ing alternative splicing variants, SNP sites, etc. The prereq- uisite for carrying out these researches is to correctly ascer- tain the gene-sequence-related ESTs. Based on analysis of the alignment results between some known gene sequences and ESTs in public database, several measures including Identity Check, Gap Check, Inclusion Check and Length Check have been introduced to judge whether an EST alignment is re- lated to a gene sequence or not. A computational program EDSAc1.0 has been developed to identify true EST align- ments and exon regions of query gene sequences. When tested with human gene sequences in the standard dataset HMR195 and evaluated with the standard measures of gene prediction performance, EDSAc1.0 can identify protein- coding regions with specificity of 0.997 and sensitivity of 0.88 at the nucleotide level, which outperform that of the coun- terpart TAP. A web server of EDSAc1.0 is available at http://infosci.hust.edu.cn.展开更多
文摘Expressed sequence tags (ESTs), which have piled up considerably so far, provide a valuable resource for finding new genes, disease-relevant genes, and for recogniz- ing alternative splicing variants, SNP sites, etc. The prereq- uisite for carrying out these researches is to correctly ascer- tain the gene-sequence-related ESTs. Based on analysis of the alignment results between some known gene sequences and ESTs in public database, several measures including Identity Check, Gap Check, Inclusion Check and Length Check have been introduced to judge whether an EST alignment is re- lated to a gene sequence or not. A computational program EDSAc1.0 has been developed to identify true EST align- ments and exon regions of query gene sequences. When tested with human gene sequences in the standard dataset HMR195 and evaluated with the standard measures of gene prediction performance, EDSAc1.0 can identify protein- coding regions with specificity of 0.997 and sensitivity of 0.88 at the nucleotide level, which outperform that of the coun- terpart TAP. A web server of EDSAc1.0 is available at http://infosci.hust.edu.cn.