针对天然气中的甲烷、乙烷、丙烷(C_1、C_2、C_3)气体分离困难的问题,本工作采用高通量计算了137953种假设的金属有机框架(Metal-organicframework,MOF)对这三种混合气体的吸附分离吸能.为了避免水蒸气的竞争吸附,首先,筛选出31399种疏...针对天然气中的甲烷、乙烷、丙烷(C_1、C_2、C_3)气体分离困难的问题,本工作采用高通量计算了137953种假设的金属有机框架(Metal-organicframework,MOF)对这三种混合气体的吸附分离吸能.为了避免水蒸气的竞争吸附,首先,筛选出31399种疏水性MOF.然后,单变量分析了这些MOF的最大孔径(LCD)、孔隙率(Φ)、体积比表面积(VSA)、亨利系数(K)、吸附热(Q_(st))、密度(ρ)共六种MOF结构/能量描述符与MOF对C_1、C_2、C_3的选择性、吸附量及两者权衡值(Trade-off between S_(i/j) and N_i, TSN)的关系,发现了吸附量和选择性"第二峰值"的存在;尤其对于C_1、C_2的分离,所有最优MOF都分布在第二峰值区间.随后采用决策树、随机森林(Random forest, RF)、支持向量机和反向传播神经网络四种机器学习算法,分别训练并预测了六种MOF描述符与性能指标的关系,结果表明RF预测效果最好.然后应用RF算法定量地分析出K、LCD和ρ三种描述符对TSN_(C1)、TSN_(C2)的相对重要性最高,而TSN_(C3)的是K、Q_(st)和ρ,根据这些描述符分别设计了吸附C_1、C_2、C_3最优MOF的决策树模型路径.最后筛选出针对C_1、C_2和C_3不同分离应用的18种最优MOF.本工作基于机器学习和高通量计算的研究思路和研究方法,第二峰值规律的发现以及最优设计路线的提出将有助于MOF在吸附分离领域的发展提供有力的指导和启示.展开更多
Accurate prediction of protein-ligand complex structures is a crucial step in structure-based drug design.Traditional molecular docking methods exhibit limitations in terms of accuracy and sampling space,while relying...Accurate prediction of protein-ligand complex structures is a crucial step in structure-based drug design.Traditional molecular docking methods exhibit limitations in terms of accuracy and sampling space,while relying on machine-learning approaches may lead to invalid conformations.In this study,we propose a novel strategy that combines molecular docking and machine learning methods.Firstly,the protein-ligand binding poses are predicted using a deep learning model.Subsequently,position-restricted docking on predicted binding poses is performed using Uni-Dock,generating physically constrained and valid binding poses.Finally,the binding poses are re-scored and ranked using machine learning scoring functions.This strategy harnesses the predictive power of machine learning and the physical constraints advantage of molecular docking.Evaluation experiments on multiple datasets demonstrate that,compared to using molecular docking or machine learning methods alone,our proposed strategy can significantly improve the success rate and accuracy of protein-ligand complex structure predictions.展开更多
Most pharmaceutical formulation developments are complex and ideal formulations are generally obtained after extensive experimentation.Machine learning is increasingly advancing many aspects in modern society and has ...Most pharmaceutical formulation developments are complex and ideal formulations are generally obtained after extensive experimentation.Machine learning is increasingly advancing many aspects in modern society and has achieved significant success in multiple subjects.Current research demonstrated that machine learning can be adopted to build up high-accurate predictive models in drugs/cyclodextrins(CDs)systems.Molecular descriptors of compounds and experimental conditions were employed as inputs,while complexation free energy as outputs.Results showed that the light gradient boosting machine provided significantly improved predictive performance over random forest and deep learning.The mean absolute error was 1.38 kJ/mol and squared correlation coefficient was0.86.The evaluation of relative importance of molecular descriptors further demonstrated the key factors affecting molecular interactions in drugs/CD systems.In the specific ketoprofen-CD systems,machine learning model showed better predictive performance than molecular modeling calculation,while molecular simulation could provide structural,dynamic and energetic information.The integration of machine learning and molecular simulation could produce synergistic effect for interpreting and predicting pharmaceutical formulations.In conclusion,the developed predictive models were able to quickly and accurately predict the solubilizing capacity of CD systems.Current research has taken an important step toward the application of machine learning in pharmaceutical formulation design.展开更多
Molecular profiling of cell-surface proteins is a powerful strategy for precise cancer diagnosis.While mass cytometry(MC)enables synchronous detection of over 40 cellular parameters,its full potential in disease class...Molecular profiling of cell-surface proteins is a powerful strategy for precise cancer diagnosis.While mass cytometry(MC)enables synchronous detection of over 40 cellular parameters,its full potential in disease classification is challenged by the limited types of recognition probes currently available.In this work,we synthesize a panel of heavy isotopeconjugated aptamers to profile cancer-associated signatures on the surface of hematological malignancy(HM)cells.Based on 15 molecular signatures,we performed cell-surface profiling that allowed the precise classification of 8 HM cell lines.Combined with machine-learning technology,this aptamer-based MC platform also achieved multiclass identification of HM subtypes in clinical sampleswith 100%accuracy in the training cohort and 80%accuracy in the test cohort.Therefore,we report an effective and practical strategy for precise cancer classification at the singlecell level,paving the way for its clinical use in the near future.展开更多
文摘针对天然气中的甲烷、乙烷、丙烷(C_1、C_2、C_3)气体分离困难的问题,本工作采用高通量计算了137953种假设的金属有机框架(Metal-organicframework,MOF)对这三种混合气体的吸附分离吸能.为了避免水蒸气的竞争吸附,首先,筛选出31399种疏水性MOF.然后,单变量分析了这些MOF的最大孔径(LCD)、孔隙率(Φ)、体积比表面积(VSA)、亨利系数(K)、吸附热(Q_(st))、密度(ρ)共六种MOF结构/能量描述符与MOF对C_1、C_2、C_3的选择性、吸附量及两者权衡值(Trade-off between S_(i/j) and N_i, TSN)的关系,发现了吸附量和选择性"第二峰值"的存在;尤其对于C_1、C_2的分离,所有最优MOF都分布在第二峰值区间.随后采用决策树、随机森林(Random forest, RF)、支持向量机和反向传播神经网络四种机器学习算法,分别训练并预测了六种MOF描述符与性能指标的关系,结果表明RF预测效果最好.然后应用RF算法定量地分析出K、LCD和ρ三种描述符对TSN_(C1)、TSN_(C2)的相对重要性最高,而TSN_(C3)的是K、Q_(st)和ρ,根据这些描述符分别设计了吸附C_1、C_2、C_3最优MOF的决策树模型路径.最后筛选出针对C_1、C_2和C_3不同分离应用的18种最优MOF.本工作基于机器学习和高通量计算的研究思路和研究方法,第二峰值规律的发现以及最优设计路线的提出将有助于MOF在吸附分离领域的发展提供有力的指导和启示.
基金supported by the National Key Research and Development Program of China(2022YFA1004302)
文摘Accurate prediction of protein-ligand complex structures is a crucial step in structure-based drug design.Traditional molecular docking methods exhibit limitations in terms of accuracy and sampling space,while relying on machine-learning approaches may lead to invalid conformations.In this study,we propose a novel strategy that combines molecular docking and machine learning methods.Firstly,the protein-ligand binding poses are predicted using a deep learning model.Subsequently,position-restricted docking on predicted binding poses is performed using Uni-Dock,generating physically constrained and valid binding poses.Finally,the binding poses are re-scored and ranked using machine learning scoring functions.This strategy harnesses the predictive power of machine learning and the physical constraints advantage of molecular docking.Evaluation experiments on multiple datasets demonstrate that,compared to using molecular docking or machine learning methods alone,our proposed strategy can significantly improve the success rate and accuracy of protein-ligand complex structure predictions.
基金supported by the University of Macao Research Grants(MYRG2016-00038ICMS-QRCM and MYRG2016-00040-ICMS-QRCM,Macao,China).
文摘Most pharmaceutical formulation developments are complex and ideal formulations are generally obtained after extensive experimentation.Machine learning is increasingly advancing many aspects in modern society and has achieved significant success in multiple subjects.Current research demonstrated that machine learning can be adopted to build up high-accurate predictive models in drugs/cyclodextrins(CDs)systems.Molecular descriptors of compounds and experimental conditions were employed as inputs,while complexation free energy as outputs.Results showed that the light gradient boosting machine provided significantly improved predictive performance over random forest and deep learning.The mean absolute error was 1.38 kJ/mol and squared correlation coefficient was0.86.The evaluation of relative importance of molecular descriptors further demonstrated the key factors affecting molecular interactions in drugs/CD systems.In the specific ketoprofen-CD systems,machine learning model showed better predictive performance than molecular modeling calculation,while molecular simulation could provide structural,dynamic and energetic information.The integration of machine learning and molecular simulation could produce synergistic effect for interpreting and predicting pharmaceutical formulations.In conclusion,the developed predictive models were able to quickly and accurately predict the solubilizing capacity of CD systems.Current research has taken an important step toward the application of machine learning in pharmaceutical formulation design.
基金the National Key Research Program(grant nos.2021YFA0910101,2018YFC1602900,and 2019YFA0905800)the National Natural Science Foundation of China(NSFC+1 种基金grant nos.21922404,22174039,22107027,and 21827811)the Science and Technology Project of Hunan Province(grant nos.2022JJ10005,2021RC4022,2019SK2201,2018RS3035,and 2017XK2103).
文摘Molecular profiling of cell-surface proteins is a powerful strategy for precise cancer diagnosis.While mass cytometry(MC)enables synchronous detection of over 40 cellular parameters,its full potential in disease classification is challenged by the limited types of recognition probes currently available.In this work,we synthesize a panel of heavy isotopeconjugated aptamers to profile cancer-associated signatures on the surface of hematological malignancy(HM)cells.Based on 15 molecular signatures,we performed cell-surface profiling that allowed the precise classification of 8 HM cell lines.Combined with machine-learning technology,this aptamer-based MC platform also achieved multiclass identification of HM subtypes in clinical sampleswith 100%accuracy in the training cohort and 80%accuracy in the test cohort.Therefore,we report an effective and practical strategy for precise cancer classification at the singlecell level,paving the way for its clinical use in the near future.