摘要
G蛋白偶联受体(GPCRs)是人类最重要的药物靶点,目前市场上大约34%的现代药物都以GPCRs作为靶点。在药物发现过程中,配体生物活性的准确预测对于筛选苗头化合物至关重要。对单个GPCR任务来说,通过实验得到生物活性的配体数量十分有限,若将多个GPCRs任务放在一起通过矩阵分解进行学习,可以利用GPCRs任务间的关联信息,提升配体生物活性的学习性能。基于此,本文提出了一种基于矩阵分解靶向GPCRs的配体生物活性预测方法MFSI,它耦合了配体的分子扩展连通性指纹辅助信息,克服了已知GPCR-配体生物活性关系矩阵中天然存在大量缺失值的问题。在72个具有代表性的GPCRs任务上的测试发现,它们涵盖了GPCRs的24个子家族。结果表明,本文方法全面优于经典的单任务学习和矩阵分解方法;在大多数数据集上(66/72),本文方法在性能上优于其他基于深度多任务学习的预测配体生物活性方法,与DeepNeuralNet-QSAR方法相比,在所有的数据集上本文方法在r 2和RMSE上平均分别提升了18%和12%。
G protein-coupled receptors(GPCRs)are one of the most important drug targets,accounting for about 34%of drugs on the market.For drug discovery,accurate modeling of bioactivities of ligand molecules is critical for the screening of hit compounds.For each GPCR task,its associated ligand entries with bioactivity values via biological assays usually are insufficient.The inclusion of multiple GPCR tasks in learning bioactivities of ligands through matrix factorization potentially enhances the model performance due to the utilization of correlation information among GPCR tasks.A matrix factorization-based method named MFSI for predicting bioactivities of ligand molecules targeting GPCRs is proposed.Our method couples some side information about the extended connectivity fingerprints of ligand molecules,and also overcomes the problem of existing a large number of missing bioactivity values in GPCR-ligand association matrices.Our method has been tested on a series of 72 representative GPCR tasks which cover 24 subfamilies.The results show that our method is overall superior to classical single-task learning methods and matrix factorization methods.In addition,our method achieves better performance than state-of-the-art deep multi-task learning-based methods of predicting ligand bioactivities on most datasets(66/72),and our method obtained an average improvement of 18%on r 2 and 12%on root mean square error over the DeepNeuralNet-QSAR predictors.
作者
吴建盛
兰闯闯
秦洁
朱燕翔
胡海峰
WU Jiansheng;LAN Chuangchuang;QIN Jie;ZHU Yanxiang;HU Haifeng(School of Geographic and Biological Information,Nanjing University of Posts and Telecommunications,Nanjing 210023,Jiangsu,China;School of Telecommunication and Information Engineering,Nanjing University of Posts and Telecommunications,Nanjing 210023,Jiangsu,China;VeriMake Research,Nanjing Qujike Info-tech Co.,Ltd.,Nanjing 210088,Jiangsu,China)
出处
《陕西师范大学学报(自然科学版)》
CAS
CSCD
北大核心
2021年第1期1-13,共13页
Journal of Shaanxi Normal University:Natural Science Edition
基金
国家自然科学基金(61872198,61971216,81771478,81973512)
江苏省高等学校自然科学研究项目(18KJB416005)
南京邮电大学科研基金(NY218092)。
关键词
G蛋白偶联受体
配体生物活性
矩阵分解
扩展连通性指纹
G protein-coupled receptors
bioactivities of ligands
matrix factorization
extended connectivity fingerprints