Suppr超能文献

通过智能机器学习分类推进混杂聚集抑制剂分析。

Advancing promiscuous aggregating inhibitor analysis with intelligent machine learning classification.

作者信息

Wang Luxuan, Ji Beihong, Zhai Jingchen, Wang Junmei

机构信息

Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St., Pittsburgh, PA 15261, United States.

出版信息

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf205.

Abstract

Small molecules have been playing a crucial role in drug discovery; however, some exhibit nonspecific inhibitory effects during hit screening due to the formation of colloidal aggregators. Such false positives often lead to significant research costs and time investment. Therefore, to identify potential aggregating compounds efficiently and accurately at an early stage of drug discovery, we employed several machine learning techniques to develop classification models for identifying promiscuous aggregating inhibitors. Using a training dataset of 10 000 aggregators and 10 000 nonaggregators, models were trained by combining four different molecular representations with various machine learning algorithms. We found that the best-performing model is the one that employs path-based FP2 fingerprints in conjunction with the cubic support vector machine algorithm, which achieved the highest accuracy and area under the receiver operating characteristic curve values for both the validation and test datasets while maintaining high sensitivity and specificity levels (>0.93). Additionally, we have proposed a new model interpretation method, global sensitivity analysis (GSA), to complement the well-recognized SHapley Additive exPlanations analysis. Several comparative studies have shown that GSA is a time-efficient and accurate approach for identifying crucial descriptors that contribute to model prediction, especially in the scenario where the dataset contains a substantial number of data entries with a limited set of descriptors. Our models as well as GSA findings can provide useful guidance on screening library design to minimize false positives.

摘要

小分子在药物发现中一直发挥着关键作用;然而,由于胶体聚集体的形成,一些小分子在命中筛选过程中表现出非特异性抑制作用。这种假阳性结果常常导致巨大的研究成本和时间投入。因此,为了在药物发现的早期阶段高效、准确地识别潜在的聚集化合物,我们采用了多种机器学习技术来开发用于识别混杂聚集抑制剂的分类模型。使用包含10000个聚集剂和10000个非聚集剂的训练数据集,通过将四种不同的分子表示与各种机器学习算法相结合来训练模型。我们发现性能最佳的模型是结合基于路径的FP2指纹与立方支持向量机算法的模型,该模型在验证和测试数据集上均实现了最高的准确率以及受试者工作特征曲线下面积值,同时保持了较高的灵敏度和特异性水平(>0.93)。此外,我们提出了一种新的模型解释方法——全局敏感性分析(GSA),以补充广为人知的SHapley加法解释分析。多项比较研究表明,GSA是一种高效且准确的方法,可用于识别对模型预测有贡献的关键描述符,特别是在数据集包含大量具有有限描述符集的数据条目的情况下。我们的模型以及GSA的研究结果可为筛选库设计提供有用的指导,以尽量减少假阳性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/17b27ecd1dd8/bbaf205ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验