Suppr超能文献

基于半自动参数调整的随机森林在抗乳腺癌药物优化中的应用。

Application of random forest based on semi-automatic parameter adjustment for optimization of anti-breast cancer drugs.

作者信息

Liu Jiajia, Zhou Zhihui, Kong Shanshan, Ma Zezhong

机构信息

College of Science, North China University of Science and Technology, Tangshan, China.

Hebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, China.

出版信息

Front Oncol. 2022 Jul 22;12:956705. doi: 10.3389/fonc.2022.956705. eCollection 2022.

Abstract

The optimization of drug properties in the process of cancer drug development is very important to save research and development time and cost. In order to make the anti-breast cancer drug candidates with good biological activity, this paper collected 1974 compounds, firstly, the top 20 molecular descriptors that have the most influence on biological activity were screened by using XGBoost-based data feature selection; secondly, on this basis, take pIC50 values as feature data and use a variety of machine learning algorithms to compare, soas to select a most suitable algorithm to predict the IC50 and pIC50 values. It is preliminarily found that the effects of Random Forest, XGBoost and Gradient-enhanced algorithms are good and have little difference, and the Support vector machine is the worst. Then, using the Semi-automatic parameter adjustment method to adjust the parameters of Random Forest, XGBoost and Gradient-enhanced algorithms to find the optimal parameters. It is found that the Random Forest algorithm has high accuracy and excellent anti over fitting, and the algorithm is stable. Its prediction accuracy is 0.745. Finally, the accuracy of the results is verified by training the model with the preliminarily selected data, which provides an innovative solution for the optimization of the properties of anti- breast cancer drugs, and can provide better support for the early research and development of anti-breast cancer drugs.

摘要

在抗癌药物研发过程中优化药物特性对于节省研发时间和成本非常重要。为了获得具有良好生物活性的抗乳腺癌候选药物,本文收集了1974种化合物,首先,通过基于XGBoost的数据特征选择筛选出对生物活性影响最大的前20个分子描述符;其次,在此基础上,以pIC50值作为特征数据,使用多种机器学习算法进行比较,以便选择最合适的算法来预测IC50和pIC50值。初步发现,随机森林、XGBoost和梯度增强算法的效果良好且差异不大,支持向量机效果最差。然后,使用半自动参数调整方法对随机森林、XGBoost和梯度增强算法的参数进行调整以找到最优参数。发现随机森林算法具有较高的准确性和出色的抗过拟合能力,且该算法稳定。其预测准确率为0.745。最后,通过用初步选定的数据训练模型来验证结果的准确性,这为抗乳腺癌药物特性的优化提供了创新解决方案,并可为抗乳腺癌药物的早期研发提供更好的支持。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验