拷贝数变异(Copy number variations,CNVs)主要指大于1kb以上的DNA片段的缺失、插入、重复等。CNVs广泛存在于人类和其他哺乳动物的基因组中。文章主要介绍了CNVs对人类疾病的影响及其检测技术,并对CNVs在动物抗病育种中的应用前景进行...拷贝数变异(Copy number variations,CNVs)主要指大于1kb以上的DNA片段的缺失、插入、重复等。CNVs广泛存在于人类和其他哺乳动物的基因组中。文章主要介绍了CNVs对人类疾病的影响及其检测技术,并对CNVs在动物抗病育种中的应用前景进行了展望。由于拷贝数变异对抗病性和易感性的影响至关重要,因此采用生物技术手段有望将其运用于家畜标记辅助选择、QTL精细定位以及动物优良抗病品种培育当中。展开更多
The information commons for rice(IC4 R)database is a collection of 18 million single nucleotide polymorphisms(SNPs)identified by resequencing of 5152 rice accessions.Although IC4 R offers ultra-high density rice varia...The information commons for rice(IC4 R)database is a collection of 18 million single nucleotide polymorphisms(SNPs)identified by resequencing of 5152 rice accessions.Although IC4 R offers ultra-high density rice variation map,these raw SNPs are not readily usable for the public.To satisfy different research utilizations of SNPs for population genetics,evolutionary analysis,association studies,and genomic breeding in rice,raw genotypic data of these 18 million SNPs were processed by unified bioinformatics pipelines.The outcomes were used to develop a daughter database of IC4 R-SnpReady for Rice(SR4 R).SR4 R presents four reference SNP panels,including 2,097,405 hapmapSNPs after data filtration and genotype imputation,156,502 tagSNPs selected from linkage disequilibrium-based redundancy removal,1180 fixedSNPs selected from genes exhibiting selective sweep signatures,and 38 barcodeSNPs selected from DNA fingerprinting simulation.SR4 R thus offers a highly efficient rice variation map that combines reduced SNP redundancy with extensive data describing the genetic diversity of rice populations.In addition,SR4 R provides rice researchers with a web interface that enables them to browse all four SNP panels,use online toolkits,as well as retrieve the original data and scripts for a variety of population genetics analyses on local computers.SR4 R is freely available to academic users at http://sr4 r.ic4 r.org/.展开更多
Genomic variation is the genetic basis of phenotypic diversity among individuals, including variation in disease susceptibility and drug response. The greatest promise of the International HapMap is to provide roadmap...Genomic variation is the genetic basis of phenotypic diversity among individuals, including variation in disease susceptibility and drug response. The greatest promise of the International HapMap is to provide roadmaps for identifying genetic variants predisposing to complex diseases. Single nucleotide polymorphism (SNP) is the fundamental element of the HapMap. Allele frequency of SNPs is one of the major factors affecting the resulting HapMap, being the factor upon which linkage disequilibrium (LD) is calculated, haplotypes are constructed, and tagging SNPs (tagSNPs) are selected. The cutoff thresholds for the frequency of minor alleles used in the making of the map therefore have profound effects on the resolution of that map. To date most researchers have adopted their own cutoff thresholds, and there has been little real dataset-based evaluation of the effects of different cutoff thresholds on HapMap resolution. In an attempt to assess the implications of different cutoff values, we analyzed our own data for the centromeric genes on Chromosome 15 in Chinese Han and Tibetan populations, with respect to minor allele frequency cutoff values of ≥0.01 (0.01 group), ≥0.05 (0.05 group), and ≥0.10 (0.10 group), and constructed HapMaps from each of the datasets. The resolution, study power and cost-effectiveness for each of the maps were compared. Our results show that the 0.01 threshold pro- vides the greatest power (P = 0.019 in Han and P = 0.029 in Tibetan for 0.01 vs. 0.05 threshold) and detects most population-specific haploypes (P = 0.012 for 0.01 vs. 0.05 threshold). However, in the regions studied, the 0.05 cutoff threshold did not significantly increase power above the 0.10 threshold (P = 0.191 in Han; 1.000 in Tibetans), and did not improve resolution over the 0.10 value for population-specific haplotypes (P = 0.592) neither. Furthermore the 0.05 and 0.10 values produced the same figures for tagging efficiency, LD block number, LD length, study power and cost-savings in the Tibetan population. These r展开更多
Common variants explain little of the variance of most common disease, prompting large-scale sequencing studies to understand the contribution of rare variants to these diseases. Imputation of rare variants from genom...Common variants explain little of the variance of most common disease, prompting large-scale sequencing studies to understand the contribution of rare variants to these diseases. Imputation of rare variants from genome-wide genotypic arrays offers a cost-efficient strategy to achieve necessary sample sizes required for adequate statistical power. To estimate the performance of imputation of rare variants, we imputed 153 individuals, each of whom was genotyped on 3 different genotype arrays including 317k, 610k and 1 million single nucleotide polymorphisms (SNPs), to two different reference panels: HapMap2 and 1000 Genomes pilot March 2010 release (1KGpilot) by using IMPUTE version 2. We found that more than 94% and 84% of all SNPs yield acceptable accuracy (info 〉 0.4) in HapMap2 and 1KGpilot-based imputation, respectively. For rare variants (minor allele frequency (MAF) 〈5%), the proportion of well- imputed SNPs increased as the MAF increased from 0.3% to 5% across all 3 genome-wide association study (GWAS) datasets. The proportion of well-imputed SNPs was 69%, 60% and 49% for SNPs with a MAF from 0.3% to 5% for 1M, 610k and 317k, respectively. None of the very rare variants (MAF 〈 0.3%) were well imputed. We conclude that the imputation accuracy of rare variants increases with higher density of genome-wide genotyping arrays when the size of the reference panel is small. Variants with lower MAF are more difficult to impute. These findings have important implications in the design and replication of large-scale sequencing studies.展开更多
To overcome the obstacle of the fascinating relation in predicting animal phenotype value, we have developed a neural network model to detect the complex non-linear relationships between the genotypes and phenotypes a...To overcome the obstacle of the fascinating relation in predicting animal phenotype value, we have developed a neural network model to detect the complex non-linear relationships between the genotypes and phenotypes and the possible interactions that cannot be expressed with equations. In this paper, back-propagation neural network is used to discuss the influences of different allele frequencies on estimating the polygenic phenotype value. To ensure the precision of prediction, normalization was needed to train the prediction model. The results show that back-propagation artificial neural networks can be used to predict the phenotype value and perform very well in allele frequency from 0.2 to 0.8, when the allele frequency is very small (less than 0.2) or big (more than 0.8); however, the prediction model was not reliable and the predicted value should be carefully tested.展开更多
In recent era,advancement of research involves computational management of large-scale genomic and post-genomic datasets in an obvious way.Rapidly emerging field of bioinformatics,fueled by high-throughput technologie...In recent era,advancement of research involves computational management of large-scale genomic and post-genomic datasets in an obvious way.Rapidly emerging field of bioinformatics,fueled by high-throughput technologies and genomic scale database,is believed to reshape our approach of research to a new level.Genomics has shifted the paradigm of biological perspectives exploring many scopes.Old initiatives paved the path for the newer and more advantageous one.The present review focuses on present initiatives that are implemented till now like the famous Human Genome Project and its influence on digital biology,as well as the projects that followed in its footsteps.Additionally,the authors delve into the future potential of personalized medicine and the use of genetic engineering methods like CRISPR/Cas9 in gene editing,which are thought to have the potential to revolutionize the current treatment strategy.展开更多
Transmission distortion (TD) is a significant departure from Mendelian predictions of genes or chromosomes to offspring. While many biological processes have been implicated, there is still much to be understood abo...Transmission distortion (TD) is a significant departure from Mendelian predictions of genes or chromosomes to offspring. While many biological processes have been implicated, there is still much to be understood about TD in humans. Here we present our findings from a genome-wide scan for evidence of TD using haplotype data of 60 trio families from the International HapMap Project. Fisher's exact test was applied to assess the extent of TD in 629,958 SNPs across the autosomes. Based on the empirical distribution of PFisher and further permutation tests, we identified 1,205 outlier loci and 224 candidate genes with TD. Using the PANTHER gene ontology database, we found 19 categories of biological processes with an enrichment of candidate genes. In particular, the “protein phosphorylation” category contained the largest number of candidates in both HapMap samples. Further analysis uncovered an intriguing non-synonymous change in PPPIR12B, a gene related to protein phosphorylation, which appears to influence the allele transmission from male parents in the YRI (Yoruba from Ibadan, Nigeria) population. Our findings also indicate an ethnicity-related property of TD signatures in HapMap samples and provide new clues for our understanding of TD in humans.展开更多
Genetic variations and their functional implications have been one of the focuses in recent genome research. With the release of the HapMap by the International Consortium, and the availability of the ultra-high-volum...Genetic variations and their functional implications have been one of the focuses in recent genome research. With the release of the HapMap by the International Consortium, and the availability of the ultra-high-volume genotyping platform, it will soon be possible to use genome-wide association ap- proach to identify genetic variations responsible for complex traits/diseases. While the power of this ap- proach is generally agreed, it is a debated issue as to how much population difference should be exploited, and how best it should be applied. To address this issue we have sequenced 7 genes in the centromeric region of chromosome 15, investigated their SNPs, SNP frequencies, tagSNPs, LD structures, and hap- lotypes in 50 Tibetan subjects, and compared them with those from the Han population. Genetic diversi- ties between the two populations were also quantified. Our results show that the overall genetic variation between the two populations is very little, but there are differences, primarily in allele frequencies, which is a dominating factor for haplotypes and tagSNPs. In general Tibetans have longer LD and less diversity inthe region studied. These data provide genetic evi- dence for the close relationship between the two populations, and support the idea that all populations are fundamentally the same, but also indicate popu- lation variations, particularly in allele frequency, should be taken into account in complex traits/ dis- eases analysis. Data obtained in this investigation not only help us understand the genome region, but also provide road maps for variation study in the genes/ region in Tibetan population.展开更多
目的:研究北京汉族人群中ABCA4基因单核苷酸多态性,为病因学研究提供依据。方法:选取国际人类基因组单体型图计划(Hap Map)公布的北京汉族人群(Han Chinese in Beijing,China,CHB)ABCA4基因SNPs基因型数据,利用Haploview4.2软件...目的:研究北京汉族人群中ABCA4基因单核苷酸多态性,为病因学研究提供依据。方法:选取国际人类基因组单体型图计划(Hap Map)公布的北京汉族人群(Han Chinese in Beijing,China,CHB)ABCA4基因SNPs基因型数据,利用Haploview4.2软件对其进行分析。结果:Hapmap提供的343个ABCA4基因的SNPs中,有129个(37.6%)纯合基因型SNPs和214个(62.39%)合格SNPs。本研究共确定95个标签SNPs,构建了3个单体域,各单体域均以前2种单体型为主,累计频率在91.1%-94.0%之间。结论:通过分析北京汉族人群ABCA4基因SNPs数据,得到了标签SNPs、单体域和主要单体型,为进一步的病因学研究打下了基础。展开更多
文摘拷贝数变异(Copy number variations,CNVs)主要指大于1kb以上的DNA片段的缺失、插入、重复等。CNVs广泛存在于人类和其他哺乳动物的基因组中。文章主要介绍了CNVs对人类疾病的影响及其检测技术,并对CNVs在动物抗病育种中的应用前景进行了展望。由于拷贝数变异对抗病性和易感性的影响至关重要,因此采用生物技术手段有望将其运用于家畜标记辅助选择、QTL精细定位以及动物优良抗病品种培育当中。
基金supported by the National Natural Science Foundation of China(Grant No.31871706)the Department of Agriculture of Guangdong Province(2018-36)+2 种基金Science and Technology Program of Guangdong Province(Grant No.2019B030316006)The Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDA24040201)the Youth Innovation Promotion Association of the Chinese Academy of Sciences(Grant No.2017141)
文摘The information commons for rice(IC4 R)database is a collection of 18 million single nucleotide polymorphisms(SNPs)identified by resequencing of 5152 rice accessions.Although IC4 R offers ultra-high density rice variation map,these raw SNPs are not readily usable for the public.To satisfy different research utilizations of SNPs for population genetics,evolutionary analysis,association studies,and genomic breeding in rice,raw genotypic data of these 18 million SNPs were processed by unified bioinformatics pipelines.The outcomes were used to develop a daughter database of IC4 R-SnpReady for Rice(SR4 R).SR4 R presents four reference SNP panels,including 2,097,405 hapmapSNPs after data filtration and genotype imputation,156,502 tagSNPs selected from linkage disequilibrium-based redundancy removal,1180 fixedSNPs selected from genes exhibiting selective sweep signatures,and 38 barcodeSNPs selected from DNA fingerprinting simulation.SR4 R thus offers a highly efficient rice variation map that combines reduced SNP redundancy with extensive data describing the genetic diversity of rice populations.In addition,SR4 R provides rice researchers with a web interface that enables them to browse all four SNP panels,use online toolkits,as well as retrieve the original data and scripts for a variety of population genetics analyses on local computers.SR4 R is freely available to academic users at http://sr4 r.ic4 r.org/.
基金Supported by the Key Construction Program of the National"985"Project of China(Phase Ⅱ)Natural Science Foundation of Guangdong Province(Grant No.031673)Guang-zhou Municipal Science and Technology Foundation(Grant Nos.2002Z3-C7191,2004Z3-C7501)
文摘Genomic variation is the genetic basis of phenotypic diversity among individuals, including variation in disease susceptibility and drug response. The greatest promise of the International HapMap is to provide roadmaps for identifying genetic variants predisposing to complex diseases. Single nucleotide polymorphism (SNP) is the fundamental element of the HapMap. Allele frequency of SNPs is one of the major factors affecting the resulting HapMap, being the factor upon which linkage disequilibrium (LD) is calculated, haplotypes are constructed, and tagging SNPs (tagSNPs) are selected. The cutoff thresholds for the frequency of minor alleles used in the making of the map therefore have profound effects on the resolution of that map. To date most researchers have adopted their own cutoff thresholds, and there has been little real dataset-based evaluation of the effects of different cutoff thresholds on HapMap resolution. In an attempt to assess the implications of different cutoff values, we analyzed our own data for the centromeric genes on Chromosome 15 in Chinese Han and Tibetan populations, with respect to minor allele frequency cutoff values of ≥0.01 (0.01 group), ≥0.05 (0.05 group), and ≥0.10 (0.10 group), and constructed HapMaps from each of the datasets. The resolution, study power and cost-effectiveness for each of the maps were compared. Our results show that the 0.01 threshold pro- vides the greatest power (P = 0.019 in Han and P = 0.029 in Tibetan for 0.01 vs. 0.05 threshold) and detects most population-specific haploypes (P = 0.012 for 0.01 vs. 0.05 threshold). However, in the regions studied, the 0.05 cutoff threshold did not significantly increase power above the 0.10 threshold (P = 0.191 in Han; 1.000 in Tibetans), and did not improve resolution over the 0.10 value for population-specific haplotypes (P = 0.592) neither. Furthermore the 0.05 and 0.10 values produced the same figures for tagging efficiency, LD block number, LD length, study power and cost-savings in the Tibetan population. These r
文摘Common variants explain little of the variance of most common disease, prompting large-scale sequencing studies to understand the contribution of rare variants to these diseases. Imputation of rare variants from genome-wide genotypic arrays offers a cost-efficient strategy to achieve necessary sample sizes required for adequate statistical power. To estimate the performance of imputation of rare variants, we imputed 153 individuals, each of whom was genotyped on 3 different genotype arrays including 317k, 610k and 1 million single nucleotide polymorphisms (SNPs), to two different reference panels: HapMap2 and 1000 Genomes pilot March 2010 release (1KGpilot) by using IMPUTE version 2. We found that more than 94% and 84% of all SNPs yield acceptable accuracy (info 〉 0.4) in HapMap2 and 1KGpilot-based imputation, respectively. For rare variants (minor allele frequency (MAF) 〈5%), the proportion of well- imputed SNPs increased as the MAF increased from 0.3% to 5% across all 3 genome-wide association study (GWAS) datasets. The proportion of well-imputed SNPs was 69%, 60% and 49% for SNPs with a MAF from 0.3% to 5% for 1M, 610k and 317k, respectively. None of the very rare variants (MAF 〈 0.3%) were well imputed. We conclude that the imputation accuracy of rare variants increases with higher density of genome-wide genotyping arrays when the size of the reference panel is small. Variants with lower MAF are more difficult to impute. These findings have important implications in the design and replication of large-scale sequencing studies.
基金Supported by the Scientific Research Starting Foundation for Doctors, Henan Institute of Science and Technology of China
文摘To overcome the obstacle of the fascinating relation in predicting animal phenotype value, we have developed a neural network model to detect the complex non-linear relationships between the genotypes and phenotypes and the possible interactions that cannot be expressed with equations. In this paper, back-propagation neural network is used to discuss the influences of different allele frequencies on estimating the polygenic phenotype value. To ensure the precision of prediction, normalization was needed to train the prediction model. The results show that back-propagation artificial neural networks can be used to predict the phenotype value and perform very well in allele frequency from 0.2 to 0.8, when the allele frequency is very small (less than 0.2) or big (more than 0.8); however, the prediction model was not reliable and the predicted value should be carefully tested.
文摘In recent era,advancement of research involves computational management of large-scale genomic and post-genomic datasets in an obvious way.Rapidly emerging field of bioinformatics,fueled by high-throughput technologies and genomic scale database,is believed to reshape our approach of research to a new level.Genomics has shifted the paradigm of biological perspectives exploring many scopes.Old initiatives paved the path for the newer and more advantageous one.The present review focuses on present initiatives that are implemented till now like the famous Human Genome Project and its influence on digital biology,as well as the projects that followed in its footsteps.Additionally,the authors delve into the future potential of personalized medicine and the use of genetic engineering methods like CRISPR/Cas9 in gene editing,which are thought to have the potential to revolutionize the current treatment strategy.
基金supported by the National Nature Science Foundation of China (No.30225017)
文摘Transmission distortion (TD) is a significant departure from Mendelian predictions of genes or chromosomes to offspring. While many biological processes have been implicated, there is still much to be understood about TD in humans. Here we present our findings from a genome-wide scan for evidence of TD using haplotype data of 60 trio families from the International HapMap Project. Fisher's exact test was applied to assess the extent of TD in 629,958 SNPs across the autosomes. Based on the empirical distribution of PFisher and further permutation tests, we identified 1,205 outlier loci and 224 candidate genes with TD. Using the PANTHER gene ontology database, we found 19 categories of biological processes with an enrichment of candidate genes. In particular, the “protein phosphorylation” category contained the largest number of candidates in both HapMap samples. Further analysis uncovered an intriguing non-synonymous change in PPPIR12B, a gene related to protein phosphorylation, which appears to influence the allele transmission from male parents in the YRI (Yoruba from Ibadan, Nigeria) population. Our findings also indicate an ethnicity-related property of TD signatures in HapMap samples and provide new clues for our understanding of TD in humans.
基金supported by the 863 Project(Grant No.2001AA221102)the Natural Science Foundation of Guangdong Province(Grant No.031673)+1 种基金the Guangzhou Municipal Science and Technology Foundation(Grant Nos.2002Z3-C7191&2004Z3-C7501)the China Medical Board of New York(Grant No.01-759).
文摘Genetic variations and their functional implications have been one of the focuses in recent genome research. With the release of the HapMap by the International Consortium, and the availability of the ultra-high-volume genotyping platform, it will soon be possible to use genome-wide association ap- proach to identify genetic variations responsible for complex traits/diseases. While the power of this ap- proach is generally agreed, it is a debated issue as to how much population difference should be exploited, and how best it should be applied. To address this issue we have sequenced 7 genes in the centromeric region of chromosome 15, investigated their SNPs, SNP frequencies, tagSNPs, LD structures, and hap- lotypes in 50 Tibetan subjects, and compared them with those from the Han population. Genetic diversi- ties between the two populations were also quantified. Our results show that the overall genetic variation between the two populations is very little, but there are differences, primarily in allele frequencies, which is a dominating factor for haplotypes and tagSNPs. In general Tibetans have longer LD and less diversity inthe region studied. These data provide genetic evi- dence for the close relationship between the two populations, and support the idea that all populations are fundamentally the same, but also indicate popu- lation variations, particularly in allele frequency, should be taken into account in complex traits/ dis- eases analysis. Data obtained in this investigation not only help us understand the genome region, but also provide road maps for variation study in the genes/ region in Tibetan population.
文摘目的:研究北京汉族人群中ABCA4基因单核苷酸多态性,为病因学研究提供依据。方法:选取国际人类基因组单体型图计划(Hap Map)公布的北京汉族人群(Han Chinese in Beijing,China,CHB)ABCA4基因SNPs基因型数据,利用Haploview4.2软件对其进行分析。结果:Hapmap提供的343个ABCA4基因的SNPs中,有129个(37.6%)纯合基因型SNPs和214个(62.39%)合格SNPs。本研究共确定95个标签SNPs,构建了3个单体域,各单体域均以前2种单体型为主,累计频率在91.1%-94.0%之间。结论:通过分析北京汉族人群ABCA4基因SNPs数据,得到了标签SNPs、单体域和主要单体型,为进一步的病因学研究打下了基础。