抗生素的滥用导致细菌出现耐药性并严重阻碍了细菌感染治疗的发展,然而噬菌体的出现使得细菌耐药性的肆虐得到了缓解,噬菌体疗法逐步成为治疗细菌感染的重要手段.为了更高效地筛选用于治疗的噬菌体,需要采用比传统湿实验更快捷有效的筛...抗生素的滥用导致细菌出现耐药性并严重阻碍了细菌感染治疗的发展,然而噬菌体的出现使得细菌耐药性的肆虐得到了缓解,噬菌体疗法逐步成为治疗细菌感染的重要手段.为了更高效地筛选用于治疗的噬菌体,需要采用比传统湿实验更快捷有效的筛选方法 .传统的计算方法往往以宿主作为预测标签,忽视了两者序列间相互作用的本质.此外,现有的方法通常仅限于种和属的水平,种内预测方法非常少见.由于物种的完整基因组信息往往难以获取,现有数据库中大多数菌株仅包含部分基因组信息.在此背景下,提出一种新的噬菌体-宿主互作预测方法,通过K-means负采样,构建差值特征向量,用于筛选代表性的负样本进行模型训练,最终本研究开发了基于差值特征向量和XGBoost的预测工具DiffXGPBI.实验结果显示,各菌种的平均预测AUC(Area Under the Curve)达到0.92,外部验证集的总体预测AUC和AUPR(Area Under the Precision-Recall curve)达到0.91和0.88,优于其他预测工具.关于特征和模型的消融实验表明,DiffXGPBI的每个模块是必要的,并且的确对预测性能起到提升作用,证实了模型的合理性.此外,基于新的未知PHI(phage host interaction)和种内案例的预测实验,验证了DiffXGPBI的泛化性和种内预测的潜力.综上,本研究提出的特征工程和预测思路提高了互作预测的鲁棒性和稳定性,具有较高的泛化性,为噬菌体治疗的快速筛选提供了新方向和见解.展开更多
Phage-microbe interactions are appealing systems to study coevolution,and have also been increasingly emphasized due to their roles in human health,disease,and the development of novel therapeutics.Phage-microbe inter...Phage-microbe interactions are appealing systems to study coevolution,and have also been increasingly emphasized due to their roles in human health,disease,and the development of novel therapeutics.Phage-microbe interactions leave diverse signals in bacterial and phage genomic sequences,defined as phage-host interaction signals(PHISs),which include clustered regularly interspaced short palindromic repeats(CRISPR)targeting,prophage,and protein-protein interaction signals.In the present study,we developed a novel tool phage-host interaction signal detector(PHISDetector)to predict phage-host interactions by detecting and integrating diverse in silico PHISs,and scoring the probability of phage-host interactions using machine learning models based on PHIS features.We evaluated the performance of PHISDetector on multiple benchmark datasets and application cases.When tested on a dataset of 758 annotated phage-host pairs,PHISDetector yields the prediction accuracies of 0.51 and 0.73 at the species and genus levels,respectively,outperforming other phage-host prediction tools.When applied to 125,842 metagenomic viral contigs(mVCs)derived from 3042 geographically diverse samples,a detection rate of 54.54% could be achieved.Furthermore,PHISDetector could predict infecting phages for 85.6% of 368 multidrug-resistant(MDR)bacteria and 30% of 454 human gut bacteria obtained from the National Institutes of Health(NIH)Human Microbiome Project(HMP).The PHISDetector can be run either as a web server(http://www.microbiome-bigdata.com/PHISDetector/)for general users to study individual inputs or as a stand-alone version(https://github.com/HITImmunologyLab/PHISDetector)to process massive phage contigs from virome studies.展开更多
文摘抗生素的滥用导致细菌出现耐药性并严重阻碍了细菌感染治疗的发展,然而噬菌体的出现使得细菌耐药性的肆虐得到了缓解,噬菌体疗法逐步成为治疗细菌感染的重要手段.为了更高效地筛选用于治疗的噬菌体,需要采用比传统湿实验更快捷有效的筛选方法 .传统的计算方法往往以宿主作为预测标签,忽视了两者序列间相互作用的本质.此外,现有的方法通常仅限于种和属的水平,种内预测方法非常少见.由于物种的完整基因组信息往往难以获取,现有数据库中大多数菌株仅包含部分基因组信息.在此背景下,提出一种新的噬菌体-宿主互作预测方法,通过K-means负采样,构建差值特征向量,用于筛选代表性的负样本进行模型训练,最终本研究开发了基于差值特征向量和XGBoost的预测工具DiffXGPBI.实验结果显示,各菌种的平均预测AUC(Area Under the Curve)达到0.92,外部验证集的总体预测AUC和AUPR(Area Under the Precision-Recall curve)达到0.91和0.88,优于其他预测工具.关于特征和模型的消融实验表明,DiffXGPBI的每个模块是必要的,并且的确对预测性能起到提升作用,证实了模型的合理性.此外,基于新的未知PHI(phage host interaction)和种内案例的预测实验,验证了DiffXGPBI的泛化性和种内预测的潜力.综上,本研究提出的特征工程和预测思路提高了互作预测的鲁棒性和稳定性,具有较高的泛化性,为噬菌体治疗的快速筛选提供了新方向和见解.
基金supported by the National Natural Science Foundation of China(Grant Nos.31825008,31422014,and 61872117).
文摘Phage-microbe interactions are appealing systems to study coevolution,and have also been increasingly emphasized due to their roles in human health,disease,and the development of novel therapeutics.Phage-microbe interactions leave diverse signals in bacterial and phage genomic sequences,defined as phage-host interaction signals(PHISs),which include clustered regularly interspaced short palindromic repeats(CRISPR)targeting,prophage,and protein-protein interaction signals.In the present study,we developed a novel tool phage-host interaction signal detector(PHISDetector)to predict phage-host interactions by detecting and integrating diverse in silico PHISs,and scoring the probability of phage-host interactions using machine learning models based on PHIS features.We evaluated the performance of PHISDetector on multiple benchmark datasets and application cases.When tested on a dataset of 758 annotated phage-host pairs,PHISDetector yields the prediction accuracies of 0.51 and 0.73 at the species and genus levels,respectively,outperforming other phage-host prediction tools.When applied to 125,842 metagenomic viral contigs(mVCs)derived from 3042 geographically diverse samples,a detection rate of 54.54% could be achieved.Furthermore,PHISDetector could predict infecting phages for 85.6% of 368 multidrug-resistant(MDR)bacteria and 30% of 454 human gut bacteria obtained from the National Institutes of Health(NIH)Human Microbiome Project(HMP).The PHISDetector can be run either as a web server(http://www.microbiome-bigdata.com/PHISDetector/)for general users to study individual inputs or as a stand-alone version(https://github.com/HITImmunologyLab/PHISDetector)to process massive phage contigs from virome studies.