• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过智能机器学习分类推进混杂聚集抑制剂分析。

Advancing promiscuous aggregating inhibitor analysis with intelligent machine learning classification.

作者信息

Wang Luxuan, Ji Beihong, Zhai Jingchen, Wang Junmei

机构信息

Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, 3501 Terrace St., Pittsburgh, PA 15261, United States.

出版信息

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf205.

DOI:10.1093/bib/bbaf205
PMID:40329861
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12056367/
Abstract

Small molecules have been playing a crucial role in drug discovery; however, some exhibit nonspecific inhibitory effects during hit screening due to the formation of colloidal aggregators. Such false positives often lead to significant research costs and time investment. Therefore, to identify potential aggregating compounds efficiently and accurately at an early stage of drug discovery, we employed several machine learning techniques to develop classification models for identifying promiscuous aggregating inhibitors. Using a training dataset of 10 000 aggregators and 10 000 nonaggregators, models were trained by combining four different molecular representations with various machine learning algorithms. We found that the best-performing model is the one that employs path-based FP2 fingerprints in conjunction with the cubic support vector machine algorithm, which achieved the highest accuracy and area under the receiver operating characteristic curve values for both the validation and test datasets while maintaining high sensitivity and specificity levels (>0.93). Additionally, we have proposed a new model interpretation method, global sensitivity analysis (GSA), to complement the well-recognized SHapley Additive exPlanations analysis. Several comparative studies have shown that GSA is a time-efficient and accurate approach for identifying crucial descriptors that contribute to model prediction, especially in the scenario where the dataset contains a substantial number of data entries with a limited set of descriptors. Our models as well as GSA findings can provide useful guidance on screening library design to minimize false positives.

摘要

小分子在药物发现中一直发挥着关键作用;然而,由于胶体聚集体的形成,一些小分子在命中筛选过程中表现出非特异性抑制作用。这种假阳性结果常常导致巨大的研究成本和时间投入。因此,为了在药物发现的早期阶段高效、准确地识别潜在的聚集化合物,我们采用了多种机器学习技术来开发用于识别混杂聚集抑制剂的分类模型。使用包含10000个聚集剂和10000个非聚集剂的训练数据集,通过将四种不同的分子表示与各种机器学习算法相结合来训练模型。我们发现性能最佳的模型是结合基于路径的FP2指纹与立方支持向量机算法的模型,该模型在验证和测试数据集上均实现了最高的准确率以及受试者工作特征曲线下面积值,同时保持了较高的灵敏度和特异性水平(>0.93)。此外,我们提出了一种新的模型解释方法——全局敏感性分析(GSA),以补充广为人知的SHapley加法解释分析。多项比较研究表明,GSA是一种高效且准确的方法,可用于识别对模型预测有贡献的关键描述符,特别是在数据集包含大量具有有限描述符集的数据条目的情况下。我们的模型以及GSA的研究结果可为筛选库设计提供有用的指导,以尽量减少假阳性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/67c35835b21c/bbaf205f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/17b27ecd1dd8/bbaf205ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/2ce0b49ce0f4/bbaf205f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/57f3d61939b4/bbaf205f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/8f12db4567de/bbaf205f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/14e5140e27ff/bbaf205f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/67c35835b21c/bbaf205f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/17b27ecd1dd8/bbaf205ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/2ce0b49ce0f4/bbaf205f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/57f3d61939b4/bbaf205f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/8f12db4567de/bbaf205f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/14e5140e27ff/bbaf205f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3549/12056367/67c35835b21c/bbaf205f5.jpg

相似文献

1
Advancing promiscuous aggregating inhibitor analysis with intelligent machine learning classification.通过智能机器学习分类推进混杂聚集抑制剂分析。
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf205.
2
Identification of active molecules against Mycobacterium tuberculosis through machine learning.通过机器学习鉴定抗结核分枝杆菌的活性分子。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab068.
3
The development of classification-based machine-learning models for the toxicity assessment of chemicals associated with plastic packaging.用于塑料包装相关化学品毒性评估的基于分类的机器学习模型的开发。
J Hazard Mater. 2025 Feb 15;484:136702. doi: 10.1016/j.jhazmat.2024.136702. Epub 2024 Nov 30.
4
Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction.验证验证:重新分析深度学习和机器学习模型在生物活性预测方面的大规模比较。
J Comput Aided Mol Des. 2020 Jul;34(7):717-730. doi: 10.1007/s10822-019-00274-0. Epub 2020 Jan 20.
5
Hit Dexter 2.0: Machine-Learning Models for the Prediction of Frequent Hitters.命中德克斯特 2.0:用于预测高频命中者的机器学习模型。
J Chem Inf Model. 2019 Mar 25;59(3):1030-1043. doi: 10.1021/acs.jcim.8b00677. Epub 2019 Jan 25.
6
Uncovering blood-brain barrier permeability: a comparative study of machine learning models using molecular fingerprints, and SHAP explainability.揭示血脑屏障通透性:使用分子指纹的机器学习模型及SHAP可解释性的比较研究
SAR QSAR Environ Res. 2024 Dec;35(12):1155-1171. doi: 10.1080/1062936X.2024.2446352. Epub 2025 Jan 8.
7
Structural Analysis and Identification of Colloidal Aggregators in Drug Discovery.药物发现中胶体聚集剂的结构分析与鉴定。
J Chem Inf Model. 2019 Sep 23;59(9):3714-3726. doi: 10.1021/acs.jcim.9b00541. Epub 2019 Aug 27.
8
Interpretable machine learning for allergic rhinitis prediction among preschool children in Urumqi, China.中国乌鲁木齐学龄前儿童变应性鼻炎预测的可解释机器学习。
Sci Rep. 2024 Sep 27;14(1):22281. doi: 10.1038/s41598-024-73733-w.
9
Bioactivity Comparison across Multiple Machine Learning Algorithms Using over 5000 Datasets for Drug Discovery.利用 5000 多个数据集进行药物发现的多种机器学习算法的生物活性比较。
Mol Pharm. 2021 Jan 4;18(1):403-415. doi: 10.1021/acs.molpharmaceut.0c01013. Epub 2020 Dec 16.
10
Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型:基于多中心队列研究的开发与验证研究
J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

本文引用的文献

1
Resonant waveguide grating based assays for colloidal aggregate detection and promiscuity characterization in natural products.基于共振波导光栅的天然产物中胶体聚集体检测及混杂性表征分析方法
RSC Adv. 2019 Nov 21;9(65):38055-38064. doi: 10.1039/c9ra06466d. eCollection 2019 Nov 19.
2
A Crowding Barrier to Protein Inhibition in Colloidal Aggregates.胶态聚集物中蛋白质抑制的拥挤障碍。
J Med Chem. 2021 Apr 8;64(7):4109-4116. doi: 10.1021/acs.jmedchem.0c02253. Epub 2021 Mar 24.
3
Machine learning on ligand-residue interaction profiles to significantly improve binding affinity prediction.
基于配体残基相互作用谱的机器学习可显著提高结合亲和力预测。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab054.
4
Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models.图神经网络能否为药物发现学习更好的分子表示?基于描述符和基于图的模型的比较研究。
J Cheminform. 2021 Feb 17;13(1):12. doi: 10.1186/s13321-020-00479-8.
5
How Do Small Molecule Aggregates Inhibit Enzyme Activity? A Molecular Dynamics Study.小分子聚集体如何抑制酶活性?分子动力学研究。
J Chem Inf Model. 2020 Aug 24;60(8):3901-3909. doi: 10.1021/acs.jcim.0c00540. Epub 2020 Jul 21.
6
Are 2D fingerprints still valuable for drug discovery?二维指纹在药物发现中仍然有价值吗?
Phys Chem Chem Phys. 2020 Apr 29;22(16):8373-8390. doi: 10.1039/d0cp00305k.
7
Detection of Small-Molecule Aggregation with High-Throughput Microplate Biophysical Methods.高通量微板生物物理方法检测小分子聚集。
Curr Protoc Chem Biol. 2020 Mar;12(1):e78. doi: 10.1002/cpch.78.
8
Structural Analysis and Identification of Colloidal Aggregators in Drug Discovery.药物发现中胶体聚集剂的结构分析与鉴定。
J Chem Inf Model. 2019 Sep 23;59(9):3714-3726. doi: 10.1021/acs.jcim.9b00541. Epub 2019 Aug 27.
9
Pushing the Boundaries of Molecular Representation for Drug Discovery with the Graph Attention Mechanism.利用图注意力机制拓展药物发现中分子表示的边界。
J Med Chem. 2020 Aug 27;63(16):8749-8760. doi: 10.1021/acs.jmedchem.9b00959. Epub 2019 Aug 27.
10
Computational advances in combating colloidal aggregation in drug discovery.在药物发现中对抗胶体聚集的计算进展。
Nat Chem. 2019 May;11(5):402-418. doi: 10.1038/s41557-019-0234-9. Epub 2019 Apr 15.