In many organisms, the difference in codon usage patterns among genes reflects variation in local base compositional biases and the intensity of natural selection. In this study, a comparative analysis was performed t...In many organisms, the difference in codon usage patterns among genes reflects variation in local base compositional biases and the intensity of natural selection. In this study, a comparative analysis was performed to investigate the characteristics of codon bias and factors in shaping the codon usage patterns among mitochondrion, chloroplast and nuclear genes in common wheat (Triticum aestivum L.). GC contents in nuclear genes were higher than that in mitochondrion and chloroplast genes. The neutrality and correspondence analyses indicated that the codon usage in nuclear genes would be a result of relative strong mutational bias, while the codon usage patterns of mitochondrion and chloroplast genes were more conserved in GC content and influenced by translation level. The Parity Rule 2 (PR2) plot analysis showed that pyrimidines were used more frequently than purines at the third codon position in the three genomes. In addition, using a new alterative strategy, 11, 12, and 24 triplets were defined as preferred codons in the mitochondrion, chloroplast and nuclear genes, respectively. These findings suggested that the mitochondrion, chloroplast and nuclear genes shared particularly different features of codon usage and evolutionary constraints.展开更多
Fungi blast is one of the most serious diseases of rice worldwide. Breeding resistant varieties have been proved to be the most effective and economical means to control the disease. This paper describes the molecular...Fungi blast is one of the most serious diseases of rice worldwide. Breeding resistant varieties have been proved to be the most effective and economical means to control the disease. This paper describes the molecular marker-assisted selection (MAS) procedure for a broad-spectrum blast resistant gene Pi1 integrated into an elite hybrid maintainer line, Zhenshan 97. A simple sequence repeat (SSR) based on molecular marker-aided selection system for Pi1 segment was established. Using a backcross population and a blast isolate F1829, Pi1 gene was mapped on the top of chromosome 11 between markers RZ536 and RM144, with a distance of 9.7 cM and 6.8 cM, respectively. Seventeen families derived from the recurrent parent Zhenshan 97 were obtained with homozygous Pi1 gene. The background of the 17 families was identified with inter simple sequence repeat (ISSR) amplification, the highest recovery of the Zhenshan 97 genetic background was 97.01% after the assay of 167 polymorphic bands.展开更多
The high-affinity K+ (HAK) transporter gene family is the largest family in plant that functions as potassium transporter and is important for various aspects of plant life. In the present study, we identified 27 m...The high-affinity K+ (HAK) transporter gene family is the largest family in plant that functions as potassium transporter and is important for various aspects of plant life. In the present study, we identified 27 members of this family in rice genome. The phylogenetic tree divided the land plant HAK transporter proteins into 6 distinct groups. Although the main characteristic of this family was established before the origin of seed plants, they also showed some differences between the members of non-seed and seed plants. The HAK genes in rice were found to have expanded in lineage-specific manner after the split of monocots and dicots, and both segmental duplication events and tandem duplication events contributed to the expansion of this family. Functional divergence analysis for this family provided statistical evidence for shifted evolutionary rate after gene duplication. Further analysis indicated that both point mutant with positive selection and gene conversion events contributed to the evolution of this family in rice.展开更多
In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying result...In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear sta- tistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two repre- sentative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method per- forms well in selecting genes and achieves high classification accuracies with these genes.展开更多
针对ARCO(AUC and rank correlation coefficient optimization)算法在进行两类问题特征选择时,采用斯皮尔曼等级相关系数度量已选特征子集冗余性带来信息损失和特征相关性与冗余性度量取值范围不一致的缺陷,提出改进的Pearson相关系数...针对ARCO(AUC and rank correlation coefficient optimization)算法在进行两类问题特征选择时,采用斯皮尔曼等级相关系数度量已选特征子集冗余性带来信息损失和特征相关性与冗余性度量取值范围不一致的缺陷,提出改进的Pearson相关系数度量特征冗余性,并归一化特征相关性和冗余性度量范围,得到APCO(AUC and improved Pearson correlation coefficient optimization)算法以克服ARCO算法的不足。同时,针对实现多类特征选择的MAUCD(using MAUC as the relevance metric to rank features directly)和MDFS(MAUC decomposition based feature selection method)算法没有考虑特征冗余,且MDFS易选择到局部最优特征子集的问题,提出适于多类问题的改进Pearson相关系数度量特征冗余性,得到基于mRMR(maximal relevance-minimal redundancy)框架的MAUCP和MDFSP算法,克服MAUCD和MDFS算法的缺陷。以SVM、NB和KNN为分类工具,构造基于所选特征子集的相应分类器,以其AUC(MAUC)值度量相应特征子集的性能。7个二类和3个多类不平衡基因数据集的实验结果表明:提出的APCO、MAUCP和MDFSP算法分别优于ARCO、MAUCD和MDFS算法,也优于其他经典基因选择算法。展开更多
基金Supported by the Sate Key Basic Research and Development Plan of China (2003CB715904) and the National Science Foundation for 0verseas Distinguished Young Scholar (30428003)
文摘In many organisms, the difference in codon usage patterns among genes reflects variation in local base compositional biases and the intensity of natural selection. In this study, a comparative analysis was performed to investigate the characteristics of codon bias and factors in shaping the codon usage patterns among mitochondrion, chloroplast and nuclear genes in common wheat (Triticum aestivum L.). GC contents in nuclear genes were higher than that in mitochondrion and chloroplast genes. The neutrality and correspondence analyses indicated that the codon usage in nuclear genes would be a result of relative strong mutational bias, while the codon usage patterns of mitochondrion and chloroplast genes were more conserved in GC content and influenced by translation level. The Parity Rule 2 (PR2) plot analysis showed that pyrimidines were used more frequently than purines at the third codon position in the three genomes. In addition, using a new alterative strategy, 11, 12, and 24 triplets were defined as preferred codons in the mitochondrion, chloroplast and nuclear genes, respectively. These findings suggested that the mitochondrion, chloroplast and nuclear genes shared particularly different features of codon usage and evolutionary constraints.
文摘Fungi blast is one of the most serious diseases of rice worldwide. Breeding resistant varieties have been proved to be the most effective and economical means to control the disease. This paper describes the molecular marker-assisted selection (MAS) procedure for a broad-spectrum blast resistant gene Pi1 integrated into an elite hybrid maintainer line, Zhenshan 97. A simple sequence repeat (SSR) based on molecular marker-aided selection system for Pi1 segment was established. Using a backcross population and a blast isolate F1829, Pi1 gene was mapped on the top of chromosome 11 between markers RZ536 and RM144, with a distance of 9.7 cM and 6.8 cM, respectively. Seventeen families derived from the recurrent parent Zhenshan 97 were obtained with homozygous Pi1 gene. The background of the 17 families was identified with inter simple sequence repeat (ISSR) amplification, the highest recovery of the Zhenshan 97 genetic background was 97.01% after the assay of 167 polymorphic bands.
基金supported by the National Basic Research Program of China (No. 2006CB101700)the National High- tech Research and Development Program (No. 2006AA10Z165)the Program for New Century Excellent Talents in Uni-versity of China (No. NCET2005-05- 0502).
文摘The high-affinity K+ (HAK) transporter gene family is the largest family in plant that functions as potassium transporter and is important for various aspects of plant life. In the present study, we identified 27 members of this family in rice genome. The phylogenetic tree divided the land plant HAK transporter proteins into 6 distinct groups. Although the main characteristic of this family was established before the origin of seed plants, they also showed some differences between the members of non-seed and seed plants. The HAK genes in rice were found to have expanded in lineage-specific manner after the split of monocots and dicots, and both segmental duplication events and tandem duplication events contributed to the expansion of this family. Functional divergence analysis for this family provided statistical evidence for shifted evolutionary rate after gene duplication. Further analysis indicated that both point mutant with positive selection and gene conversion events contributed to the evolution of this family in rice.
基金Project supported by the National Basic Research Program (973) of China (No. 2002CB312200) and the Center for Bioinformatics Pro-gram Grant of Harvard Center of Neurodegeneration and Repair,Harvard Medical School, Harvard University, Boston, USA
文摘In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear sta- tistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two repre- sentative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method per- forms well in selecting genes and achieves high classification accuracies with these genes.
文摘针对ARCO(AUC and rank correlation coefficient optimization)算法在进行两类问题特征选择时,采用斯皮尔曼等级相关系数度量已选特征子集冗余性带来信息损失和特征相关性与冗余性度量取值范围不一致的缺陷,提出改进的Pearson相关系数度量特征冗余性,并归一化特征相关性和冗余性度量范围,得到APCO(AUC and improved Pearson correlation coefficient optimization)算法以克服ARCO算法的不足。同时,针对实现多类特征选择的MAUCD(using MAUC as the relevance metric to rank features directly)和MDFS(MAUC decomposition based feature selection method)算法没有考虑特征冗余,且MDFS易选择到局部最优特征子集的问题,提出适于多类问题的改进Pearson相关系数度量特征冗余性,得到基于mRMR(maximal relevance-minimal redundancy)框架的MAUCP和MDFSP算法,克服MAUCD和MDFS算法的缺陷。以SVM、NB和KNN为分类工具,构造基于所选特征子集的相应分类器,以其AUC(MAUC)值度量相应特征子集的性能。7个二类和3个多类不平衡基因数据集的实验结果表明:提出的APCO、MAUCP和MDFSP算法分别优于ARCO、MAUCD和MDFS算法,也优于其他经典基因选择算法。