摘要
乳腺癌在全球范围内已取代肺癌成为最常见的癌症,并且其死亡率居高不下。因此,利用机器学习和智能优化算法等技术筛选乳腺癌药物对于推动乳腺癌治疗药物的发展至关重要。本文提出了一种基于改进的随机森林算法构建ERa活性预测模型的方法,并筛选出对生物活性最具影响力的前20个分子描述符。然后,使用该模型对50个化合物的IC50值和对应的pIC50值进行预测。同时,借助支持向量机(SVM)和Adaboost二分类模型,对化合物Caco-2、CYP3A4、hERG、HOB、MN的5种成分进行分别预测,并建立ADMET分类预测模型。最后,利用秃鹰搜索算法构建化合物筛选模型,使用黑鹰搜索算法融合前两个模型,解决各类复杂数值优化问题,以找到可行性药物操作变量范围。实验结果表明,所提出的预测模型具有很高的准确性,可应用于抗乳腺癌药物的研发。
Breast cancer has replaced lung cancer as the most common cancer worldwide, and its mortality rate remains high. Therefore, the selection of breast cancer drugs using techniques such as machine learning and intelligent optimization algorithms is of great significance to drive the development of breast cancer treatment drugs. In this paper, we propose a method based on the improved random forest algorithm to construct an ERa activity prediction model and select the top 20 most influential molecular descriptors for biological activity. Subsequently, using this model, we predict the IC50 values and corresponding pIC50 values of 50 compounds. Furthermore, with the aid of support vector machine (SVM) and Adaboost binary classification models, we predict the five components (Caco-2, CYP3A4, hERG, HOB, MN) of the compounds separately and establish an ADMET classifica-tion prediction model. Finally, we construct a compound screening model using the Bald Eagle search algorithm and integrate it with the previous two models using the Black Hawk search algo-rithm to address various complex numerical optimization problems and determine the feasible range of drug operating variables. Experimental results demonstrate that the proposed prediction model exhibits high accuracy and can be applied to the development of anti- breast cancer drugs.
出处
《建模与仿真》
2023年第4期3930-3942,共13页
Modeling and Simulation