Acid-base dissociation constant(pK_(a)) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pK_(a) prediction still suffer from limited...Acid-base dissociation constant(pK_(a)) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pK_(a) prediction still suffer from limited applicability domain and lack of chemical insight. Here we present MF-SuP-pK_(a)(multi-fidelity modeling with subgraph pooling for pK_(a) prediction), a novel pK_(a) prediction model that utilizes subgraph pooling, multi-fidelity learning and data augmentation. In our model, a knowledgeaware subgraph pooling strategy was designed to capture the local and global environments around the ionization sites for micro-pK_(a) prediction. To overcome the scarcity of accurate pK_(a) data, lowfidelity data(computational pK_(a)) was used to fit the high-fidelity data(experimental pK_(a)) through transfer learning. The final MF-SuP-pK_(a) model was constructed by pre-training on the augmented ChEMBL data set and fine-tuning on the DataWarrior data set. Extensive evaluation on the DataWarrior data set and three benchmark data sets shows that MF-SuP-pK_(a) achieves superior performances to the state-of-theart pK_(a) prediction models while requires much less high-fidelity training data. Compared with Attentive FP, MF-SuP-pK_(a) achieves 23.83% and 20.12% improvement in terms of mean absolute error(MAE) on the acidic and basic sets, respectively.展开更多
提取量子化学参数来表征苯甲酸类化合物的结构 ,应用多元回归方法和人工神经网法在该类化合物的结构和 p Ka 值间构造了二维空间的数学模型 ,并进一步运用 Co MFA法在三维空间进行研究。人工神经网络法和 Co MFA法获得了比较好的结果 ,...提取量子化学参数来表征苯甲酸类化合物的结构 ,应用多元回归方法和人工神经网法在该类化合物的结构和 p Ka 值间构造了二维空间的数学模型 ,并进一步运用 Co MFA法在三维空间进行研究。人工神经网络法和 Co MFA法获得了比较好的结果 ,同时 ,讨论了空间作用和静电作用对 p Ka值的影响。展开更多
基金financially supported by National Key Research and Development Program of China (2021YFF1201400)National Natural Science Foundation of China (22220102001)Natural Science Foundation of Zhejiang Province (LZ19H300001, LD22H300001, China)。
文摘Acid-base dissociation constant(pK_(a)) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pK_(a) prediction still suffer from limited applicability domain and lack of chemical insight. Here we present MF-SuP-pK_(a)(multi-fidelity modeling with subgraph pooling for pK_(a) prediction), a novel pK_(a) prediction model that utilizes subgraph pooling, multi-fidelity learning and data augmentation. In our model, a knowledgeaware subgraph pooling strategy was designed to capture the local and global environments around the ionization sites for micro-pK_(a) prediction. To overcome the scarcity of accurate pK_(a) data, lowfidelity data(computational pK_(a)) was used to fit the high-fidelity data(experimental pK_(a)) through transfer learning. The final MF-SuP-pK_(a) model was constructed by pre-training on the augmented ChEMBL data set and fine-tuning on the DataWarrior data set. Extensive evaluation on the DataWarrior data set and three benchmark data sets shows that MF-SuP-pK_(a) achieves superior performances to the state-of-theart pK_(a) prediction models while requires much less high-fidelity training data. Compared with Attentive FP, MF-SuP-pK_(a) achieves 23.83% and 20.12% improvement in terms of mean absolute error(MAE) on the acidic and basic sets, respectively.