Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India.
Computer Aided Drug Design Center, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India.
Comput Biol Med. 2024 Sep;179:108734. doi: 10.1016/j.compbiomed.2024.108734. Epub 2024 Jul 3.
Artificial intelligence (AI) has played a vital role in computer-aided drug design (CADD). This development has been further accelerated with the increasing use of machine learning (ML), mainly deep learning (DL), and computing hardware and software advancements. As a result, initial doubts about the application of AI in drug discovery have been dispelled, leading to significant benefits in medicinal chemistry. At the same time, it is crucial to recognize that AI is still in its infancy and faces a few limitations that need to be addressed to harness its full potential in drug discovery. Some notable limitations are insufficient, unlabeled, and non-uniform data, the resemblance of some AI-generated molecules with existing molecules, unavailability of inadequate benchmarks, intellectual property rights (IPRs) related hurdles in data sharing, poor understanding of biology, focus on proxy data and ligands, lack of holistic methods to represent input (molecular structures) to prevent pre-processing of input molecules (feature engineering), etc. The major component in AI infrastructure is input data, as most of the successes of AI-driven efforts to improve drug discovery depend on the quality and quantity of data, used to train and test AI algorithms, besides a few other factors. Additionally, data-gulping DL approaches, without sufficient data, may collapse to live up to their promise. Current literature suggests a few methods, to certain extent, effectively handle low data for better output from the AI models in the context of drug discovery. These are transferring learning (TL), active learning (AL), single or one-shot learning (OSL), multi-task learning (MTL), data augmentation (DA), data synthesis (DS), etc. One different method, which enables sharing of proprietary data on a common platform (without compromising data privacy) to train ML model, is federated learning (FL). In this review, we compare and discuss these methods, their recent applications, and limitations while modeling small molecule data to get the improved output of AI methods in drug discovery. Article also sums up some other novel methods to handle inadequate data.
人工智能(AI)在计算机辅助药物设计(CADD)中发挥了至关重要的作用。随着机器学习(ML)的广泛应用,尤其是深度学习(DL)的应用,以及计算硬件和软件的进步,这一发展进程进一步加速。因此,人们对 AI 在药物发现中的应用的最初疑虑已经消除,这为药物化学带来了显著的益处。与此同时,我们必须认识到 AI 仍处于起步阶段,面临着一些限制,需要加以解决才能充分发挥其在药物发现中的潜力。一些值得注意的限制包括数据不足、未标记和非统一、一些 AI 生成的分子与现有分子相似、缺乏足够的基准、数据共享方面的知识产权(IPR)相关障碍、对生物学的理解不足、关注代理数据和配体、缺乏整体方法来表示输入(分子结构)以防止输入分子的预处理(特征工程)等。AI 基础设施的主要组成部分是输入数据,因为大多数 AI 驱动的提高药物发现效率的努力的成功主要取决于用于训练和测试 AI 算法的数据的质量和数量,此外还有其他一些因素。此外,在没有足够数据的情况下,数据吞噬型的 DL 方法可能无法实现其承诺。目前的文献表明,在药物发现的背景下,有几种方法在一定程度上可以有效地处理少量数据,从而从 AI 模型中获得更好的输出。这些方法包括迁移学习(TL)、主动学习(AL)、单样本或单次学习(OSL)、多任务学习(MTL)、数据增强(DA)、数据合成(DS)等。一种不同的方法是联邦学习(FL),它可以在不损害数据隐私的情况下,在一个通用平台上共享专有数据来训练 ML 模型。在这篇综述中,我们比较和讨论了这些方法,以及它们在对小分子数据进行建模以获得 AI 方法在药物发现中改进输出时的最新应用和局限性。文章还总结了一些处理数据不足的其他新方法。