An intramolecular isotopic study was conducted on natural gases collected from coal-derived gas reser-voirs in sedimentary basins of China to determine their position-specific isotope distributions.The propane from th...An intramolecular isotopic study was conducted on natural gases collected from coal-derived gas reser-voirs in sedimentary basins of China to determine their position-specific isotope distributions.The propane from the Turpan-Hami Basin exhibited negativeΔc-T(δ13Ccentral-δ13Cterminal)values ranging from-3.9‰to-0.3‰,with an average of-2.1‰.Propane from the Ordos Basin,Sichuan Basin,and Tarim Basin showed positiveΔC-T values,with averages of 1.3‰,5.4‰and 7.6‰,respectively.Positionspecific carbon isotope compositions reveal the precursors and the propane generation pathways in the petroliferous basins.Propane formed from the thermal cracking of TypeⅢkerogen has largerδ13Ccentral andδ13Cterminal values than propane from TypeⅠ/Ⅱkerogen.The precursor for natural gases collected in this study is identified to be TypeⅢkerogen.Comparing our data to calculated results for thermal cracking of TypeⅢkerogen,we found that propane from the low-maturity gas reservoir in the Turpan Basin was generated via the i-propyl radical pathway,whereas propane from the Sulige tight gas reservoir in the Ordos Basin was formed via the n-propyl radical pathway.δ13Cterminal values covered a narrow range across basins,in contrast toδ13Ccentral.The terminal carbon position in propane is less impacted by microbial oxidation and more relevant to maturity levels and precursors.Thus,δ13Cterminal has a good potential to infer the origin and maturity level of natural gas.In examining post-generation processes,we proposed an improved identification strategy for microbial oxidation of natural gases,based on the position-specific carbon isotope distributions of propane.Samples from the Liaohe Depression of the Bohai Bay Basin and the Sichuan Basin were detected of post-generation microbial oxidation.Overall,position-specific carbon isotope composition of propane provides new insights into the generation mechanism and post-generation processes of natural gas in the geological period at the atomic level.展开更多
Protein secretion plays an important role in bacterial lifestyles. In Gram-negative bacteria, a wide range of proteins are secreted to modulate the interactions of bacteria with their environments and other bacteria v...Protein secretion plays an important role in bacterial lifestyles. In Gram-negative bacteria, a wide range of proteins are secreted to modulate the interactions of bacteria with their environments and other bacteria via various secretion systems. These proteins are essential for the virulence of bacteria, so it is crucial to study them for the pathogenesis of diseases and the development of drugs. Using amino acid composition (AAC), position-specific scoring matrix (PSSM) and N-terminal signal peptides, two different substitution models are firstly constructed to transform protein sequences into numerical vectors. Then, based on support vector machine (SVM) and the “one to one”?algorithm, a hybrid multi-classifier named SecretP v.2.2 is proposed to rapidly and accurately?distinguish different types of Gram-negative?bacterial secreted proteins. When performed on the same test set for a comparison with other methods, SecretP v.2.2 gets the highest total sensitivity of 93.60%. A public independent dataset is used to further test the power of SecretP v.2.2 for predicting NCSPs, it also yields satisfactory results.展开更多
Subcellular location is one of the key biological characteristics of proteins. Position-specific profiles (PSP) have been introduced as important characteristics of proteins in this article. In this study, to obtain...Subcellular location is one of the key biological characteristics of proteins. Position-specific profiles (PSP) have been introduced as important characteristics of proteins in this article. In this study, to obtain position-specific profiles, the Position Specific lterative-Basic Local Alignment Search Tool (PSI-BLAST) has been used to search for protein sequences in a database. Position-specific scoring matrices are extracted from the profiles as one class of characteristics. Four-part amino acid compositions and lst-7th order dipeptide compositions have also been calculated as the other two classes of characteristics. Therefore, twelve characteristic vectors are extracted from each of the protein sequences. Next, the characteristic vectors are weighed by a simple weighing function and inputted into a BP neural network predictor named PSP-Weighted Neural Network (PSP-WNN). The Levenberg-Marquardt algorithm is employed to adjust the weight matrices and thresholds during the network training instead of the error back propagation algorithm. With a jackknife test on the RH2427 dataset, PSP-WNN has achieved a higher overall prediction accuracy of 88.4% rather than the prediction results by the general BP neural network, Markov model, and fuzzy k-nearest neighbors algorithm on this dataset. In addition, the prediction performance of PSP-WNN has been evaluated with a five-fold cross validation test on the PK7579 dataset and the prediction results have been consistently better than those of the previous method on the basis of several support vector machines, using compositions of both amino acids and amino acid pairs. These results indicate that PSP-WNN is a powerful tool for subcellular localization prediction. At the end of the article, influences on prediction accuracy using different weighting proportions among three characteristic vector categories have been discussed. An appropriate proportion is considered by increasing the prediction accuracy.展开更多
The number and arrangement of subunits that form a protein are referred to as quaternary structure.Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and i...The number and arrangement of subunits that form a protein are referred to as quaternary structure.Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and interaction process with other molecules in a biological system.With the explosion of protein sequences generated in the Post-Genomic Age,it is vital to develop an automated method to deal with such a challenge.To explore this prob-lem,we adopted an approach based on the pseudo position-specific score matrix(Pse-PSSM)descriptor,proposed by Chou and Shen,representing a protein sample.The Pse-PSSM descriptor is advantageous in that it can combine the evolution information and sequence-correlated informa-tion.However,incorporating all these effects into a descriptor may cause‘high dimension disaster’.To over-come such a problem,the fusion approach was adopted by Chou and Shen.A completely different approach,linear dimensionality reduction algorithm principal component analysis(PCA)is introduced to extract key features from the high-dimensional Pse-PSSM space.The obtained dimension-reduced descriptor vector is a compact repre-sentation of the original high dimensional vector.The jack-knife test results indicate that the dimensionality reduction approach is efficient in coping with complicated problems in biological systems,such as predicting the quaternary struc-ture of proteins.展开更多
In this study,we introduce a novel bioinformatics program,Spore-associated Symbiotic Microbes Position-specific Function(SeSaMe PS Function),for position-specific functional analysis of short sequences derived from me...In this study,we introduce a novel bioinformatics program,Spore-associated Symbiotic Microbes Position-specific Function(SeSaMe PS Function),for position-specific functional analysis of short sequences derived from metagenome sequencing data of the arbuscular mycorrhizal fungi.The unique advantage of the program lies in databases created based on genus-specific sequence properties derived from protein secondary structure,namely amino acid usages,codon usages,and codon contexts of 3-codon DNA 9-mers.SeSaMe PS Function searches a query sequence against reference sequence database,identifies 3-codon DNA 9-mers with structural roles,and creates a comparative dataset containing the codon usage biases of the 3-codon DNA 9-mers from 54 bacterial and fungal genera.The program applies correlation principal component analysis in conjunction with K-means clustering method to the comparative dataset.3-codon DNA 9-mers clustered as a sole member or with only a few members are often structurally and functionally distinctive sites that provide useful insights into important molecular interactions.The program provides a versatile means for studying functions of short sequences from metagenome sequencing and has a wide spectrum of applications.SeSaMe PS Function is freely accessible at www.fungalsesame.org.展开更多
Background:Arsenic has a broad anti-cancer ability against hematologic malignancies and solid tumors.To systematically understand the biological functions of arsenic,we need to identify arsenic-binding proteins in hum...Background:Arsenic has a broad anti-cancer ability against hematologic malignancies and solid tumors.To systematically understand the biological functions of arsenic,we need to identify arsenic-binding proteins in human cells.However,due to lack of effective theoretical tools and experimental methods,only a few arsenic-binding proteins have been identified.Methods:Based on the crystal structure of ArsM,we generated a single mutation free energy profile for arsenic binding using free energy perturbation methods.Multiple validations provide an indication that our computational model has the ability to predict arsenic-binding proteins with desirable accuracy.We subsequently apply this computational model to scan the entire human genome to identify all the potential arsenic-binding proteins.Results:The computationally predicted arsenic-binding proteins show a wide range of biological functions,especially in the signaling transduction pathways.In the signaling transduction pathways,arsenic directly binds to the key factors(e.g.,Notch receptors,Notch ligands,Wnt family proteins,TGF-beta,and their interacting proteins)and results in significant inhibitions on their enzymatic activities,further having a crucial impact on the related signaling pathways.Conclusions:Arsenic has a significant impact on signaling transduction in cells.Arsenic binding to proteins can lead to dysfunctions of the target proteins,having crucial impacts on both signaling pathway and gene transcription.We hope that the computationally predicted arsenic-binding proteins and the functional analysis can provide a novel insight into the biological functions of arsenic,revealing a mechanism for the broad anti-cancer of arsenic.展开更多
基金financially supported by the National Natural Science Foundation of China(Grant Nos.42102202 and 41930426).
文摘An intramolecular isotopic study was conducted on natural gases collected from coal-derived gas reser-voirs in sedimentary basins of China to determine their position-specific isotope distributions.The propane from the Turpan-Hami Basin exhibited negativeΔc-T(δ13Ccentral-δ13Cterminal)values ranging from-3.9‰to-0.3‰,with an average of-2.1‰.Propane from the Ordos Basin,Sichuan Basin,and Tarim Basin showed positiveΔC-T values,with averages of 1.3‰,5.4‰and 7.6‰,respectively.Positionspecific carbon isotope compositions reveal the precursors and the propane generation pathways in the petroliferous basins.Propane formed from the thermal cracking of TypeⅢkerogen has largerδ13Ccentral andδ13Cterminal values than propane from TypeⅠ/Ⅱkerogen.The precursor for natural gases collected in this study is identified to be TypeⅢkerogen.Comparing our data to calculated results for thermal cracking of TypeⅢkerogen,we found that propane from the low-maturity gas reservoir in the Turpan Basin was generated via the i-propyl radical pathway,whereas propane from the Sulige tight gas reservoir in the Ordos Basin was formed via the n-propyl radical pathway.δ13Cterminal values covered a narrow range across basins,in contrast toδ13Ccentral.The terminal carbon position in propane is less impacted by microbial oxidation and more relevant to maturity levels and precursors.Thus,δ13Cterminal has a good potential to infer the origin and maturity level of natural gas.In examining post-generation processes,we proposed an improved identification strategy for microbial oxidation of natural gases,based on the position-specific carbon isotope distributions of propane.Samples from the Liaohe Depression of the Bohai Bay Basin and the Sichuan Basin were detected of post-generation microbial oxidation.Overall,position-specific carbon isotope composition of propane provides new insights into the generation mechanism and post-generation processes of natural gas in the geological period at the atomic level.
文摘Protein secretion plays an important role in bacterial lifestyles. In Gram-negative bacteria, a wide range of proteins are secreted to modulate the interactions of bacteria with their environments and other bacteria via various secretion systems. These proteins are essential for the virulence of bacteria, so it is crucial to study them for the pathogenesis of diseases and the development of drugs. Using amino acid composition (AAC), position-specific scoring matrix (PSSM) and N-terminal signal peptides, two different substitution models are firstly constructed to transform protein sequences into numerical vectors. Then, based on support vector machine (SVM) and the “one to one”?algorithm, a hybrid multi-classifier named SecretP v.2.2 is proposed to rapidly and accurately?distinguish different types of Gram-negative?bacterial secreted proteins. When performed on the same test set for a comparison with other methods, SecretP v.2.2 gets the highest total sensitivity of 93.60%. A public independent dataset is used to further test the power of SecretP v.2.2 for predicting NCSPs, it also yields satisfactory results.
基金the National Natural Science Foundation of China (No. 60471003).
文摘Subcellular location is one of the key biological characteristics of proteins. Position-specific profiles (PSP) have been introduced as important characteristics of proteins in this article. In this study, to obtain position-specific profiles, the Position Specific lterative-Basic Local Alignment Search Tool (PSI-BLAST) has been used to search for protein sequences in a database. Position-specific scoring matrices are extracted from the profiles as one class of characteristics. Four-part amino acid compositions and lst-7th order dipeptide compositions have also been calculated as the other two classes of characteristics. Therefore, twelve characteristic vectors are extracted from each of the protein sequences. Next, the characteristic vectors are weighed by a simple weighing function and inputted into a BP neural network predictor named PSP-Weighted Neural Network (PSP-WNN). The Levenberg-Marquardt algorithm is employed to adjust the weight matrices and thresholds during the network training instead of the error back propagation algorithm. With a jackknife test on the RH2427 dataset, PSP-WNN has achieved a higher overall prediction accuracy of 88.4% rather than the prediction results by the general BP neural network, Markov model, and fuzzy k-nearest neighbors algorithm on this dataset. In addition, the prediction performance of PSP-WNN has been evaluated with a five-fold cross validation test on the PK7579 dataset and the prediction results have been consistently better than those of the previous method on the basis of several support vector machines, using compositions of both amino acids and amino acid pairs. These results indicate that PSP-WNN is a powerful tool for subcellular localization prediction. At the end of the article, influences on prediction accuracy using different weighting proportions among three characteristic vector categories have been discussed. An appropriate proportion is considered by increasing the prediction accuracy.
基金supported by the National Natural Science Foundation of China(Grant No.60704047).
文摘The number and arrangement of subunits that form a protein are referred to as quaternary structure.Knowing the quaternary structure of an uncharacterized protein provides clues to finding its biological function and interaction process with other molecules in a biological system.With the explosion of protein sequences generated in the Post-Genomic Age,it is vital to develop an automated method to deal with such a challenge.To explore this prob-lem,we adopted an approach based on the pseudo position-specific score matrix(Pse-PSSM)descriptor,proposed by Chou and Shen,representing a protein sample.The Pse-PSSM descriptor is advantageous in that it can combine the evolution information and sequence-correlated informa-tion.However,incorporating all these effects into a descriptor may cause‘high dimension disaster’.To over-come such a problem,the fusion approach was adopted by Chou and Shen.A completely different approach,linear dimensionality reduction algorithm principal component analysis(PCA)is introduced to extract key features from the high-dimensional Pse-PSSM space.The obtained dimension-reduced descriptor vector is a compact repre-sentation of the original high dimensional vector.The jack-knife test results indicate that the dimensionality reduction approach is efficient in coping with complicated problems in biological systems,such as predicting the quaternary struc-ture of proteins.
文摘In this study,we introduce a novel bioinformatics program,Spore-associated Symbiotic Microbes Position-specific Function(SeSaMe PS Function),for position-specific functional analysis of short sequences derived from metagenome sequencing data of the arbuscular mycorrhizal fungi.The unique advantage of the program lies in databases created based on genus-specific sequence properties derived from protein secondary structure,namely amino acid usages,codon usages,and codon contexts of 3-codon DNA 9-mers.SeSaMe PS Function searches a query sequence against reference sequence database,identifies 3-codon DNA 9-mers with structural roles,and creates a comparative dataset containing the codon usage biases of the 3-codon DNA 9-mers from 54 bacterial and fungal genera.The program applies correlation principal component analysis in conjunction with K-means clustering method to the comparative dataset.3-codon DNA 9-mers clustered as a sole member or with only a few members are often structurally and functionally distinctive sites that provide useful insights into important molecular interactions.The program provides a versatile means for studying functions of short sequences from metagenome sequencing and has a wide spectrum of applications.SeSaMe PS Function is freely accessible at www.fungalsesame.org.
基金This work was supported by the National Key R&D Program of China(Nos.2016YFC0901704 and 2017YFA0505500)National High-Tech R&D Program(863 Program,No.2015AA020105)+2 种基金the National Natural Science Foundation of China(Nos.21377085 and 31770070)MOE New Century Excellent Talents in University(No.NCET-12-0354)SJTU Med-Eng Joint Program(No.YG2016MS33)for financial supports.
文摘Background:Arsenic has a broad anti-cancer ability against hematologic malignancies and solid tumors.To systematically understand the biological functions of arsenic,we need to identify arsenic-binding proteins in human cells.However,due to lack of effective theoretical tools and experimental methods,only a few arsenic-binding proteins have been identified.Methods:Based on the crystal structure of ArsM,we generated a single mutation free energy profile for arsenic binding using free energy perturbation methods.Multiple validations provide an indication that our computational model has the ability to predict arsenic-binding proteins with desirable accuracy.We subsequently apply this computational model to scan the entire human genome to identify all the potential arsenic-binding proteins.Results:The computationally predicted arsenic-binding proteins show a wide range of biological functions,especially in the signaling transduction pathways.In the signaling transduction pathways,arsenic directly binds to the key factors(e.g.,Notch receptors,Notch ligands,Wnt family proteins,TGF-beta,and their interacting proteins)and results in significant inhibitions on their enzymatic activities,further having a crucial impact on the related signaling pathways.Conclusions:Arsenic has a significant impact on signaling transduction in cells.Arsenic binding to proteins can lead to dysfunctions of the target proteins,having crucial impacts on both signaling pathway and gene transcription.We hope that the computationally predicted arsenic-binding proteins and the functional analysis can provide a novel insight into the biological functions of arsenic,revealing a mechanism for the broad anti-cancer of arsenic.