• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

InertDB作为一种通过生成式人工智能扩展的来自PubChem的生物无活性小分子资源。

InertDB as a generative AI-expanded resource of biologically inactive small molecules from PubChem.

作者信息

An Seungchan, Lee Yeonjin, Gong Junpyo, Hwang Seokyoung, Park In Guk, Cho Jayhyun, Lee Min Ju, Kim Minkyu, Kang Yun Pyo, Noh Minsoo

机构信息

College of Pharmacy, Natural Products Research Institute, Seoul National University, Seoul, 08826, Republic of Korea.

出版信息

J Cheminform. 2025 Apr 10;17(1):49. doi: 10.1186/s13321-025-00999-1.

DOI:10.1186/s13321-025-00999-1
PMID:40211375
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11983867/
Abstract

The development of robust artificial intelligence (AI)-driven predictive models relies on high-quality, diverse chemical datasets. However, the scarcity of negative data and a publication bias toward positive results often hinder accurate biological activity prediction. To address this challenge, we introduce InertDB, a comprehensive database comprising 3,205 curated inactive compounds (CICs) identified through rigorous review of over 4.6 million compound records in PubChem. CIC selection prioritized bioassay diversity, determined using natural language processing (NLP)-based clustering metrics, while ensuring minimal biological activity across all evaluated bioassays. Notably, 97.2% of CICs adhere to the Rule of Five, a proportion significantly higher than that of overall PubChem dataset. To further expand the chemical space, InertDB also features 64,368 generated inactive compounds (GICs) produced using a deep generative AI model trained on the CIC dataset. Compared to conventional approaches such as random sampling or property-matched decoys, InertDB significantly improves predictive AI performance, particularly for phenotypic activity prediction by providing reliable inactive compound sets.Scientific contributionsInertDB addresses a critical gap in AI-driven drug discovery by providing a comprehensive repository of biologically inactive compounds, effectively resolving the scarcity of negative data that limits prediction accuracy and model reliability. By leveraging language model-based bioassay diversity metrics and generative AI, InertDB integrates rigorously curated inactive compounds with an expanded chemical space. InertDB serves as a valuable alternative to random sampling and decoy generation, offering improved training datasets and enhancing the accuracy of phenotypic pharmacological activity prediction.

摘要

强大的人工智能(AI)驱动的预测模型的发展依赖于高质量、多样化的化学数据集。然而,阴性数据的稀缺以及对阳性结果的发表偏倚常常阻碍准确的生物活性预测。为应对这一挑战,我们引入了InertDB,这是一个综合数据库,包含通过对PubChem中超过460万条化合物记录进行严格审查而确定的3205种经过整理的无活性化合物(CIC)。CIC的选择优先考虑生物测定多样性,使用基于自然语言处理(NLP)的聚类指标来确定,同时确保在所有评估的生物测定中生物活性最小。值得注意的是,97.2%的CIC符合五规则,这一比例显著高于整个PubChem数据集。为了进一步扩展化学空间,InertDB还具有64368种通过在CIC数据集上训练的深度生成式AI模型生成的无活性化合物(GIC)。与随机抽样或性质匹配诱饵等传统方法相比,InertDB显著提高了预测AI的性能,特别是通过提供可靠的无活性化合物集来进行表型活性预测。

科学贡献

InertDB通过提供生物无活性化合物的综合存储库,解决了AI驱动的药物发现中的一个关键差距,有效解决了限制预测准确性和模型可靠性的阴性数据稀缺问题。通过利用基于语言模型的生物测定多样性指标和生成式AI,InertDB将经过严格整理的无活性化合物与扩展的化学空间相结合。InertDB是随机抽样和诱饵生成的有价值替代方案,提供了改进的训练数据集并提高了表型药理活性预测的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c60/11983867/a292252ee15a/13321_2025_999_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c60/11983867/570957efb81c/13321_2025_999_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c60/11983867/437e9cd2cf52/13321_2025_999_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c60/11983867/250a813ab5a5/13321_2025_999_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c60/11983867/8378b4331808/13321_2025_999_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c60/11983867/a292252ee15a/13321_2025_999_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c60/11983867/570957efb81c/13321_2025_999_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c60/11983867/437e9cd2cf52/13321_2025_999_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c60/11983867/250a813ab5a5/13321_2025_999_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c60/11983867/8378b4331808/13321_2025_999_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c60/11983867/a292252ee15a/13321_2025_999_Fig5_HTML.jpg

相似文献

1
InertDB as a generative AI-expanded resource of biologically inactive small molecules from PubChem.InertDB作为一种通过生成式人工智能扩展的来自PubChem的生物无活性小分子资源。
J Cheminform. 2025 Apr 10;17(1):49. doi: 10.1186/s13321-025-00999-1.
2
Generative artificial intelligence to produce high-fidelity blastocyst-stage embryo images.生成式人工智能生成高保真囊胚期胚胎图像。
Hum Reprod. 2024 Jun 3;39(6):1197-1207. doi: 10.1093/humrep/deae064.
3
MolFilterGAN: a progressively augmented generative adversarial network for triaging AI-designed molecules.MolFilterGAN:一种用于筛选人工智能设计分子的渐进增强生成对抗网络。
J Cheminform. 2023 Apr 8;15(1):42. doi: 10.1186/s13321-023-00711-1.
4
Data mining of PubChem bioassay records reveals diverse OXPHOS inhibitory chemotypes as potential therapeutic agents against ovarian cancer.对PubChem生物测定记录进行数据挖掘,发现多种氧化磷酸化抑制化学类型可作为抗卵巢癌的潜在治疗药物。
J Cheminform. 2024 Oct 7;16(1):112. doi: 10.1186/s13321-024-00906-0.
5
Artificial Intelligence and Machine Learning in Pharmacological Research: Bridging the Gap Between Data and Drug Discovery.药理学研究中的人工智能与机器学习:弥合数据与药物发现之间的差距
Cureus. 2023 Aug 30;15(8):e44359. doi: 10.7759/cureus.44359. eCollection 2023 Aug.
6
Emerging horizons of AI in pharmaceutical research.人工智能在药物研究中的新兴前沿领域。
Adv Pharmacol. 2025;103:325-348. doi: 10.1016/bs.apha.2025.01.016. Epub 2025 Feb 16.
7
DeepMalaria: Artificial Intelligence Driven Discovery of Potent Antiplasmodials.深度疟疾:人工智能驱动的强效抗疟药物发现
Front Pharmacol. 2020 Jan 15;10:1526. doi: 10.3389/fphar.2019.01526. eCollection 2019.
8
Advancing Diabetic Foot Ulcer Care: AI and Generative AI Approaches for Classification, Prediction, Segmentation, and Detection.推进糖尿病足溃疡护理:用于分类、预测、分割和检测的人工智能及生成式人工智能方法
Healthcare (Basel). 2025 Mar 16;13(6):648. doi: 10.3390/healthcare13060648.
9
Linking transcriptome and morphology in bone cells at cellular resolution with generative AI.利用生成式人工智能在细胞分辨率下将骨细胞中的转录组与形态学联系起来。
J Bone Miner Res. 2024 Dec 31;40(1):20-26. doi: 10.1093/jbmr/zjae151.
10
Behavioral Nudging With Generative AI for Content Development in SMS Health Care Interventions: Case Study.用于短信医疗保健干预中内容开发的生成式人工智能行为助推:案例研究
JMIR AI. 2024 Oct 15;3:e52974. doi: 10.2196/52974.

本文引用的文献

1
The future of machine learning for small-molecule drug discovery will be driven by data.小分子药物发现中机器学习的未来将由数据驱动。
Nat Comput Sci. 2024 Oct;4(10):735-743. doi: 10.1038/s43588-024-00699-0. Epub 2024 Oct 15.
2
Tackling assay interference associated with small molecules.解决小分子相关的检测干扰问题。
Nat Rev Chem. 2024 May;8(5):319-339. doi: 10.1038/s41570-024-00593-3. Epub 2024 Apr 15.
3
Comprehensive machine learning boosts structure-based virtual screening for PARP1 inhibitors.综合机器学习助力基于结构的PARP1抑制剂虚拟筛选。
J Cheminform. 2024 Apr 7;16(1):40. doi: 10.1186/s13321-024-00832-1.
4
The pursuit of accurate predictive models of the bioactivity of small molecules.对小分子生物活性精确预测模型的追求。
Chem Sci. 2024 Jan 12;15(6):1938-1952. doi: 10.1039/d3sc05534e. eCollection 2024 Feb 7.
5
Inactive-enriched machine-learning models exploiting patent data improve structure-based virtual screening for PDL1 dimerizers.利用专利数据的非活性增强型机器学习模型改进了基于结构的PDL1二聚体虚拟筛选。
J Adv Res. 2025 Jan;67:185-196. doi: 10.1016/j.jare.2024.01.024. Epub 2024 Jan 26.
6
KinomeMETA: meta-learning enhanced kinome-wide polypharmacology profiling.KinomeMETA:基于元学习的激酶组泛药理学特征分析
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad461.
7
The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods.2023 年的 ChEMBL 数据库:一个涵盖多种生物活性数据类型和时间段的药物发现平台。
Nucleic Acids Res. 2024 Jan 5;52(D1):D1180-D1192. doi: 10.1093/nar/gkad1004.
8
Artificial Intelligence in Drug Toxicity Prediction: Recent Advances, Challenges, and Future Perspectives.人工智能在药物毒性预测中的应用:最新进展、挑战与未来展望。
J Chem Inf Model. 2023 May 8;63(9):2628-2643. doi: 10.1021/acs.jcim.3c00200. Epub 2023 Apr 26.
9
Artificial intelligence for drug discovery: Resources, methods, and applications.用于药物发现的人工智能:资源、方法及应用
Mol Ther Nucleic Acids. 2023 Feb 18;31:691-702. doi: 10.1016/j.omtn.2023.02.019. eCollection 2023 Mar 14.
10
Beware of Simple Methods for Structure-Based Virtual Screening: The Critical Importance of Broader Comparisons.警惕基于结构的虚拟筛选的简单方法:更广泛比较的至关重要性。
J Chem Inf Model. 2023 Mar 13;63(5):1401-1405. doi: 10.1021/acs.jcim.3c00218. Epub 2023 Feb 27.