Microsatellites or simple sequence repeats (SSRs) have been found in mostorganisms during the last decade. Since large-scale sequences are being generated, especially thosethat can be used to search for microsatellite...Microsatellites or simple sequence repeats (SSRs) have been found in mostorganisms during the last decade. Since large-scale sequences are being generated, especially thosethat can be used to search for microsatellites, the development of these markers is getting moreconvenient. Keeping SSRs in viewing the importance of the application, available CDS (codingsequences) or ESTs (expressed sequence tags) of some eukaryotic species were used to study thefrequency and density of various types of microsatellites. On the basis of surveying CDS or ESTsequences amounting to 66.6 Mb in silkworm, 37.2 Mb in fly, 20.8 Mb in mosquito, 60.0 Mb in mouse,34.9 Mb in zebrafish and 33.5 Mb in Caenorhabditis elegans, the frequency of SSRs was 1/1.00 Kb insilkworm, 1/0.77 Kb in fly, 1/1.03 Kb in mosquito, 1/1.21 Kb in mouse, 1/1.25 Kb in zebrafish and1/1.38 Kb in C. elegans. The overall average SSR frequency of these species is 1/1.07 Kb.Hexanucleotide repeats (64.5%—76.6%) are the most abundant class of SSR in the investigatedspecies, followed by trimeric, dimeric, tetrameric, monomeric and pentameric repeats. Furthermore,the A-rich repeats are predominant in each type of SSRs, whereas G-rich repeats are rare in thecoding regions.展开更多
Computational gene structure prediction, which is valuable for finding new genes and understanding the composition of genomes, plays a very important role in various kinds of genome projects. For eukaryotic gene struc...Computational gene structure prediction, which is valuable for finding new genes and understanding the composition of genomes, plays a very important role in various kinds of genome projects. For eukaryotic gene structures, however, the prediction accuracy of existing methods is still limited. This paper presents a method of pre-dicting eukaryotic gene structures based on multilevel opti-mization. The complicated problem of predicting gene structure in eukaryotic DNA sequence containing multiple genes can be decomposed into a series of sub-problems at several levels with decreasing complexity, including the gene level (single-exon gene, multi-exon gene), the element level (exon, intron, etc.), and the feature level (functional site sig-nals, codon usage preference, etc.). On the basis of this de-composition, a multilevel model for the prediction of complex gene structures is created by a multilevel optimization proc-ess, in which the models dealing with sub-problems at low complexity level are first optimized respectively, and then optimally combined together to form models for those sub-problems at higher complexity level. Based on the multi-level model, a dynamic programming algorithm is designed to search for optimal gene structures from DNA sequences, and a new program GeneKey (1.0) for the prediction of eu-karyotic gene structures is developed. Testing results with widely used datasets demonstrate that the prediction accura-cies of GeneKey (1.0) at the nucleotide level, exon level and gene level are all higher than that of the well known program GENSCAN. A web server of GeneKey(1.0) is available at http://infosci.hust.edu.展开更多
The eukaryotic genome contains varying numbers of non-coding RNA(ncRNA) genes."Computational RNomics" takes a multidisciplinary approach,like information science,to resolve the structure and function of ncRN...The eukaryotic genome contains varying numbers of non-coding RNA(ncRNA) genes."Computational RNomics" takes a multidisciplinary approach,like information science,to resolve the structure and function of ncRNAs.Here,we review the main issues in "Computational RNomics" of data storage and management,ncRNA gene identification and characterization,ncRNA target identification and functional prediction,and we summarize the main methods and current content of "computational RNomics".展开更多
Chuaria is one of the few globally distributed macrofossil pioneers documented in the Precambrian. It is perhaps the most controversial fossil in term of its affinity despite more than one hundred years of study. Many...Chuaria is one of the few globally distributed macrofossil pioneers documented in the Precambrian. It is perhaps the most controversial fossil in term of its affinity despite more than one hundred years of study. Many mutually exclusive affinities have been suggested for this frequently encountered fossil. Although often treated as a multicellular alga, this interpretation remains inconclusive because the lacking unambiguous demonstration of cellular structures. In this paper the cellular details of Chuaria are clearly revealed for the first time. The cell walls in Chuaria suggest that it is a multicellular eukaryotic alga, in agreement with the latest biogeochemical analyses. Different thicknesses of cell walls suggest primary cellular differentiation in this organism. Membrane-like structures within the cells (the first to be reported in Precambrian fossils) imply a eukaryotic nature. This study partially resolves the century-long controversy over the affinity of Chuaria, and makes Chuaria one of the few recognized multicellular eukaryotes before the Neoproterozoic glaciation.展开更多
基金This work was supported by the Hi-Tech Research and Development Program of China (863 Program) and the National Natural Science Foundation of China (No. 30300262).
文摘Microsatellites or simple sequence repeats (SSRs) have been found in mostorganisms during the last decade. Since large-scale sequences are being generated, especially thosethat can be used to search for microsatellites, the development of these markers is getting moreconvenient. Keeping SSRs in viewing the importance of the application, available CDS (codingsequences) or ESTs (expressed sequence tags) of some eukaryotic species were used to study thefrequency and density of various types of microsatellites. On the basis of surveying CDS or ESTsequences amounting to 66.6 Mb in silkworm, 37.2 Mb in fly, 20.8 Mb in mosquito, 60.0 Mb in mouse,34.9 Mb in zebrafish and 33.5 Mb in Caenorhabditis elegans, the frequency of SSRs was 1/1.00 Kb insilkworm, 1/0.77 Kb in fly, 1/1.03 Kb in mosquito, 1/1.21 Kb in mouse, 1/1.25 Kb in zebrafish and1/1.38 Kb in C. elegans. The overall average SSR frequency of these species is 1/1.07 Kb.Hexanucleotide repeats (64.5%—76.6%) are the most abundant class of SSR in the investigatedspecies, followed by trimeric, dimeric, tetrameric, monomeric and pentameric repeats. Furthermore,the A-rich repeats are predominant in each type of SSRs, whereas G-rich repeats are rare in thecoding regions.
文摘Computational gene structure prediction, which is valuable for finding new genes and understanding the composition of genomes, plays a very important role in various kinds of genome projects. For eukaryotic gene structures, however, the prediction accuracy of existing methods is still limited. This paper presents a method of pre-dicting eukaryotic gene structures based on multilevel opti-mization. The complicated problem of predicting gene structure in eukaryotic DNA sequence containing multiple genes can be decomposed into a series of sub-problems at several levels with decreasing complexity, including the gene level (single-exon gene, multi-exon gene), the element level (exon, intron, etc.), and the feature level (functional site sig-nals, codon usage preference, etc.). On the basis of this de-composition, a multilevel model for the prediction of complex gene structures is created by a multilevel optimization proc-ess, in which the models dealing with sub-problems at low complexity level are first optimized respectively, and then optimally combined together to form models for those sub-problems at higher complexity level. Based on the multi-level model, a dynamic programming algorithm is designed to search for optimal gene structures from DNA sequences, and a new program GeneKey (1.0) for the prediction of eu-karyotic gene structures is developed. Testing results with widely used datasets demonstrate that the prediction accura-cies of GeneKey (1.0) at the nucleotide level, exon level and gene level are all higher than that of the well known program GENSCAN. A web server of GeneKey(1.0) is available at http://infosci.hust.edu.
基金supported by the National Natural Science Foundation of China (Grant Nos. 30830066 and 30771151)the National Basic Research Program (Grant No. 2005CB724600)
文摘The eukaryotic genome contains varying numbers of non-coding RNA(ncRNA) genes."Computational RNomics" takes a multidisciplinary approach,like information science,to resolve the structure and function of ncRNAs.Here,we review the main issues in "Computational RNomics" of data storage and management,ncRNA gene identification and characterization,ncRNA target identification and functional prediction,and we summarize the main methods and current content of "computational RNomics".
基金supported by the Knowledge Innovation Program of the Chinese Academy of Sciences (KZCX2-YW-153,154)the National Natural Science Foundation of China (40772006, 40625006, 40632010 and J0630967)+1 种基金the State Key Laboratory of Palaeobiology and Stratigraphy, Nanjing Institute of Geology and Palaeontology (20102108 and 20101104)the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry
文摘Chuaria is one of the few globally distributed macrofossil pioneers documented in the Precambrian. It is perhaps the most controversial fossil in term of its affinity despite more than one hundred years of study. Many mutually exclusive affinities have been suggested for this frequently encountered fossil. Although often treated as a multicellular alga, this interpretation remains inconclusive because the lacking unambiguous demonstration of cellular structures. In this paper the cellular details of Chuaria are clearly revealed for the first time. The cell walls in Chuaria suggest that it is a multicellular eukaryotic alga, in agreement with the latest biogeochemical analyses. Different thicknesses of cell walls suggest primary cellular differentiation in this organism. Membrane-like structures within the cells (the first to be reported in Precambrian fossils) imply a eukaryotic nature. This study partially resolves the century-long controversy over the affinity of Chuaria, and makes Chuaria one of the few recognized multicellular eukaryotes before the Neoproterozoic glaciation.