Suppr超能文献

一种基于BERT预训练模型的深度学习模型,用于预测抗癌化合物的抗增殖活性。

A deep learning model based on the BERT pre-trained model to predict the antiproliferative activity of anti-cancer chemical compounds.

作者信息

Torabi M, Haririan I, Foroumadi A, Ghanbari H, Ghasemi F

机构信息

Biosensor Research Centre, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.

Department of Pharmaceutics, Faculty of Pharmacy, Tehran University of Medical Sciences, Tehran, Iran.

出版信息

SAR QSAR Environ Res. 2024 Nov;35(11):971-992. doi: 10.1080/1062936X.2024.2431486. Epub 2024 Nov 28.

Abstract

Identifying new compounds with minimal side effects to enhance patients' quality of life is the ultimate goal of drug discovery. Due to the expensive and time-consuming nature of experimental investigations and the scarcity of data in traditional QSAR studies, deep transfer learning models, such as the BERT model, have recently been suggested. This study evaluated the model's performance in predicting the anti-proliferative activity of five cancer cell lines (HeLa, MCF7, MDA-MB231, PC3, and MDA-MB) using over 3,000 synthesized molecules from PubChem. The results indicated that the model could predict the class of designed small molecules with acceptable accuracy for most cell lines, except for PC3 and MDA-MB. The model's performance was further tested on an in-house dataset of approximately 25 small molecules per cell line, based on IC50 values. The model accurately predicted the biological activity class for HeLa with an accuracy of and demonstrated acceptable performance for MCF7 and MDA-MB231, with accuracy between 0.56 and 0.66. However, the results were less reliable for PC3 and HepG2. In conclusion, the ChemBERTa fine-tuned model shows potential for predicting outcomes on in-house datasets.

摘要

识别副作用最小的新化合物以提高患者生活质量是药物研发的最终目标。由于实验研究成本高昂且耗时,以及传统定量构效关系(QSAR)研究中的数据稀缺,最近有人提出了深度迁移学习模型,如BERT模型。本研究使用来自PubChem的3000多种合成分子评估了该模型在预测五种癌细胞系(HeLa、MCF7、MDA-MB231、PC3和MDA-MB)抗增殖活性方面的性能。结果表明,除了PC3和MDA-MB外,该模型能够以可接受的准确率预测大多数细胞系中设计的小分子类别。基于半数抑制浓度(IC50)值,在每个细胞系约25个小分子的内部数据集上进一步测试了该模型的性能。该模型以 的准确率准确预测了HeLa的生物活性类别,并在MCF7和MDA-MB231上表现出可接受的性能,准确率在0.56至0.66之间。然而,对于PC3和HepG2,结果的可靠性较低。总之,经过微调的ChemBERTa模型在预测内部数据集的结果方面显示出潜力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验