当前人工智能药物发现中解决数据稀缺问题的策略：全面综述。

Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review.

机构信息

Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India.

Computer Aided Drug Design Center, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India.

出版信息

Comput Biol Med. 2024 Sep;179:108734. doi: 10.1016/j.compbiomed.2024.108734. Epub 2024 Jul 3.

DOI:10.1016/j.compbiomed.2024.108734

PMID:38964243

Abstract

Artificial intelligence (AI) has played a vital role in computer-aided drug design (CADD). This development has been further accelerated with the increasing use of machine learning (ML), mainly deep learning (DL), and computing hardware and software advancements. As a result, initial doubts about the application of AI in drug discovery have been dispelled, leading to significant benefits in medicinal chemistry. At the same time, it is crucial to recognize that AI is still in its infancy and faces a few limitations that need to be addressed to harness its full potential in drug discovery. Some notable limitations are insufficient, unlabeled, and non-uniform data, the resemblance of some AI-generated molecules with existing molecules, unavailability of inadequate benchmarks, intellectual property rights (IPRs) related hurdles in data sharing, poor understanding of biology, focus on proxy data and ligands, lack of holistic methods to represent input (molecular structures) to prevent pre-processing of input molecules (feature engineering), etc. The major component in AI infrastructure is input data, as most of the successes of AI-driven efforts to improve drug discovery depend on the quality and quantity of data, used to train and test AI algorithms, besides a few other factors. Additionally, data-gulping DL approaches, without sufficient data, may collapse to live up to their promise. Current literature suggests a few methods, to certain extent, effectively handle low data for better output from the AI models in the context of drug discovery. These are transferring learning (TL), active learning (AL), single or one-shot learning (OSL), multi-task learning (MTL), data augmentation (DA), data synthesis (DS), etc. One different method, which enables sharing of proprietary data on a common platform (without compromising data privacy) to train ML model, is federated learning (FL). In this review, we compare and discuss these methods, their recent applications, and limitations while modeling small molecule data to get the improved output of AI methods in drug discovery. Article also sums up some other novel methods to handle inadequate data.

摘要

人工智能（AI）在计算机辅助药物设计（CADD）中发挥了至关重要的作用。随着机器学习（ML）的广泛应用，尤其是深度学习（DL）的应用，以及计算硬件和软件的进步，这一发展进程进一步加速。因此，人们对 AI 在药物发现中的应用的最初疑虑已经消除，这为药物化学带来了显著的益处。与此同时，我们必须认识到 AI 仍处于起步阶段，面临着一些限制，需要加以解决才能充分发挥其在药物发现中的潜力。一些值得注意的限制包括数据不足、未标记和非统一、一些 AI 生成的分子与现有分子相似、缺乏足够的基准、数据共享方面的知识产权（IPR）相关障碍、对生物学的理解不足、关注代理数据和配体、缺乏整体方法来表示输入（分子结构）以防止输入分子的预处理（特征工程）等。AI 基础设施的主要组成部分是输入数据，因为大多数 AI 驱动的提高药物发现效率的努力的成功主要取决于用于训练和测试 AI 算法的数据的质量和数量，此外还有其他一些因素。此外，在没有足够数据的情况下，数据吞噬型的 DL 方法可能无法实现其承诺。目前的文献表明，在药物发现的背景下，有几种方法在一定程度上可以有效地处理少量数据，从而从 AI 模型中获得更好的输出。这些方法包括迁移学习（TL）、主动学习（AL）、单样本或单次学习（OSL）、多任务学习（MTL）、数据增强（DA）、数据合成（DS）等。一种不同的方法是联邦学习（FL），它可以在不损害数据隐私的情况下，在一个通用平台上共享专有数据来训练 ML 模型。在这篇综述中，我们比较和讨论了这些方法，以及它们在对小分子数据进行建模以获得 AI 方法在药物发现中改进输出时的最新应用和局限性。文章还总结了一些处理数据不足的其他新方法。

相似文献

Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review.当前人工智能药物发现中解决数据稀缺问题的策略：全面综述。

Comput Biol Med. 2024 Sep;179:108734. doi: 10.1016/j.compbiomed.2024.108734. Epub 2024 Jul 3.

Federated Learning in Glaucoma: A Comprehensive Review and Future Perspectives.青光眼领域的联邦学习：全面综述与未来展望

Ophthalmol Glaucoma. 2025 Jan-Feb;8(1):92-105. doi: 10.1016/j.ogla.2024.08.004. Epub 2024 Aug 29.

Artificial Intelligence and Machine Learning Technology Driven Modern Drug Discovery and Development.人工智能和机器学习技术推动现代药物发现和开发。

Int J Mol Sci. 2023 Jan 19;24(3):2026. doi: 10.3390/ijms24032026.

Artificial intelligence to deep learning: machine intelligence approach for drug discovery.人工智能到深度学习：药物发现的机器智能方法。

Mol Divers. 2021 Aug;25(3):1315-1360. doi: 10.1007/s11030-021-10217-3. Epub 2021 Apr 12.

Artificial intelligence in drug discovery: recent advances and future perspectives.药物研发中的人工智能：最新进展与未来展望。

Expert Opin Drug Discov. 2021 Sep;16(9):949-959. doi: 10.1080/17460441.2021.1909567. Epub 2021 Apr 2.

The FeatureCloud Platform for Federated Learning in Biomedicine: Unified Approach.FeatureCloud 平台在生物医学领域的联邦学习：统一方法。

J Med Internet Res. 2023 Jul 12;25:e42621. doi: 10.2196/42621.

CADD, AI and ML in drug discovery: A comprehensive review.CADD、人工智能和机器学习在药物发现中的应用：全面综述。

Eur J Pharm Sci. 2023 Feb 1;181:106324. doi: 10.1016/j.ejps.2022.106324. Epub 2022 Nov 5.

Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery.人工智能在计算机辅助药物发现中的概念。

Chem Rev. 2019 Sep 25;119(18):10520-10594. doi: 10.1021/acs.chemrev.8b00728. Epub 2019 Jul 11.

Machine Learning and Artificial Intelligence: A Paradigm Shift in Big Data-Driven Drug Design and Discovery.机器学习和人工智能：大数据驱动的药物设计与发现的范式转变。

Curr Top Med Chem. 2022;22(20):1692-1727. doi: 10.2174/1568026622666220701091339.

Artificial intelligence for small molecule anticancer drug discovery.人工智能在小分子抗癌药物发现中的应用。

Expert Opin Drug Discov. 2024 Aug;19(8):933-948. doi: 10.1080/17460441.2024.2367014. Epub 2024 Jun 18.

引用本文的文献

Deep learning for property prediction of natural fiber polymer composites.用于天然纤维聚合物复合材料性能预测的深度学习

Sci Rep. 2025 Jul 30;15(1):27837. doi: 10.1038/s41598-025-10841-1.

In Silico Validation of AI-Assisted Drugs in Healthcare.医疗保健中人工智能辅助药物的计算机模拟验证

Methods Mol Biol. 2025;2952:445-458. doi: 10.1007/978-1-0716-4690-8_24.

AI-Based Drug Discovery and Design for Different Genetic Designs.基于人工智能的针对不同基因设计的药物发现与设计

Methods Mol Biol. 2025;2952:125-148. doi: 10.1007/978-1-0716-4690-8_8.

Bibliometrics Analysis and Knowledge Mapping of Fragment-Based Drug Design Research: Trends from 2015 to 2024.基于片段的药物设计研究的文献计量学分析与知识图谱：2015年至2024年的趋势

Drug Des Devel Ther. 2025 May 22;19:4245-4262. doi: 10.2147/DDDT.S518489. eCollection 2025.

The Cytotoxic Activity of Secondary Metabolites from Marine-Derived spp.: A Review (2018-2024).海洋来源的[具体物种]次生代谢产物的细胞毒性活性综述（2018 - 2024年）

Mar Drugs. 2025 Apr 30;23(5):197. doi: 10.3390/md23050197.

Potential Benefits of In Silico Methods: A Promising Alternative in Natural Compound's Drug Discovery and Repurposing for HBV Therapy.计算机模拟方法的潜在益处：在天然化合物用于乙肝治疗的药物发现和重新利用方面的一种有前景的替代方法。

Pharmaceuticals (Basel). 2025 Mar 16;18(3):419. doi: 10.3390/ph18030419.

Investigating Potential Anti-Bacterial Natural Products Based on Ayurvedic Formulae Using Supervised Network Analysis and Machine Learning Approaches.基于阿育吠陀配方，运用监督网络分析和机器学习方法研究潜在的抗菌天然产物。

Pharmaceuticals (Basel). 2025 Jan 30;18(2):192. doi: 10.3390/ph18020192.

Computational Design and Optimization of Multi-Compound Multivesicular Liposomes for Co-Delivery of Traditional Chinese Medicine Compounds.用于中药化合物共递送的多复合多囊脂质体的计算设计与优化

AAPS PharmSciTech. 2025 Feb 11;26(2):61. doi: 10.1208/s12249-025-03042-6.

Artificial Intelligence in Natural Product Drug Discovery: Current Applications and Future Perspectives.天然产物药物发现中的人工智能：当前应用与未来展望。

J Med Chem. 2025 Feb 27;68(4):3948-3969. doi: 10.1021/acs.jmedchem.4c01257. Epub 2025 Feb 6.

Fuzz Testing Molecular Representation Using Deep Variational Anomaly Generation.使用深度变分异常生成对分子表示进行模糊测试。

J Chem Inf Model. 2025 Feb 24;65(4):1911-1927. doi: 10.1021/acs.jcim.4c01876. Epub 2025 Feb 5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

当前人工智能药物发现中解决数据稀缺问题的策略：全面综述。

Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献