Harvard College, Harvard University, Cambridge, USA.
Scientific Informatics, Global Head of Scientific Informatics, Scientific Informatics, Takeda Pharmaceuticals, Cambridge, MA, USA.
Expert Opin Drug Discov. 2021 Sep;16(9):1045-1056. doi: 10.1080/17460441.2021.1901685. Epub 2021 Mar 19.
Artificial intelligence (AI) has seen a massive resurgence in recent years with wide successes in computer vision, natural language processing, and games. The similar creation of robust and accurate AI models for ADME/Tox endpoint and activity prediction would be revolutionary to drug discovery pipelines. There have been numerous demonstrations of successful applications, but a key challenge remains: how generalizable are these predictive models?
The authors present a summary of current promising components of AI models in the context of early drug discovery where ADME/Tox endpoint and activity prediction is the main driver of the iterative drug design process. Following that is a review of applicability domains and dataset construction considerations which determine generalizability bottlenecks for AI deployment. Further reviewed is the role of promising learning frameworks - multitask, transfer, and meta learning - which leverage auxiliary data to overcome issues of generalizability.
The authors conclude that the most promising direction toward integrating reliable and informative AI models into the drug discovery pipeline is a conjunction of learned feature representations, deep learning, and novel learning frameworks. Such a solution would address the sparse and incomplete datasets that are available for key endpoints related to drug discovery.
近年来,人工智能(AI)取得了巨大的复兴,在计算机视觉、自然语言处理和游戏等领域取得了广泛的成功。为 ADME/Tox 终点和活性预测创建强大而准确的 AI 模型将对药物发现管道产生革命性的影响。已经有许多成功应用的例子,但一个关键的挑战仍然存在:这些预测模型的通用性如何?
作者在早期药物发现的背景下,总结了 AI 模型中目前有前途的组成部分,ADME/Tox 终点和活性预测是迭代药物设计过程的主要驱动因素。接下来是对适用领域和数据集构建考虑因素的回顾,这些因素决定了 AI 部署的通用性瓶颈。进一步回顾的是有前途的学习框架的作用——多任务、迁移和元学习——利用辅助数据来克服通用性问题。
作者得出的结论是,将可靠和有信息量的 AI 模型集成到药物发现管道中最有前途的方向是结合学习的特征表示、深度学习和新颖的学习框架。这样的解决方案将解决与药物发现相关的关键终点可用的稀疏和不完整数据集的问题。