Independent researcher, United States of America.
Comput Biol Med. 2022 Jun;145:105403. doi: 10.1016/j.compbiomed.2022.105403. Epub 2022 Mar 13.
Recent research on artificial intelligence indicates that machine learning algorithms can auto-generate novel drug-like molecules. Generative models have revolutionized de novo drug discovery, rendering the explorative process more efficient. Several model frameworks and input formats have been proposed to enhance the performance of intelligent algorithms in generative molecular design. In this systematic literature review of experimental articles and reviews over the last five years, machine learning models, challenges associated with computational molecule design along with proposed solutions, and molecular encoding methods are discussed. A query-based search of the PubMed, ScienceDirect, Springer, Wiley Online Library, arXiv, MDPI, bioRxiv, and IEEE Xplore databases yielded 87 studies. Twelve additional studies were identified via citation searching. Of the articles in which machine learning was implemented, six prominent algorithms were identified: long short-term memory recurrent neural networks (LSTM-RNNs), variational autoencoders (VAEs), generative adversarial networks (GANs), adversarial autoencoders (AAEs), evolutionary algorithms, and gated recurrent unit (GRU-RNNs). Furthermore, eight central challenges were designated: homogeneity of generated molecular libraries, deficient synthesizability, limited assay data, model interpretability, incapacity for multi-property optimization, incomparability, restricted molecule size, and uncertainty in model evaluation. Molecules were encoded either as strings, which were occasionally augmented using randomization, as 2D graphs, or as 3D graphs. Statistical analysis and visualization are performed to illustrate how approaches to machine learning in de novo drug design have evolved over the past five years. Finally, future opportunities and reservations are discussed.
最近关于人工智能的研究表明,机器学习算法可以自动生成新的类似药物的分子。生成模型彻底改变了从头药物发现,使探索过程更加高效。已经提出了几种模型框架和输入格式,以提高智能算法在生成分子设计中的性能。在过去五年的实验文章和综述的系统文献回顾中,讨论了机器学习模型、与计算分子设计相关的挑战以及提出的解决方案,以及分子编码方法。通过对 PubMed、ScienceDirect、Springer、Wiley Online Library、arXiv、MDPI、bioRxiv 和 IEEE Xplore 数据库的基于查询的搜索,共获得 87 项研究。通过引文搜索又确定了 12 项研究。在实施机器学习的文章中,确定了六个突出的算法:长短期记忆递归神经网络 (LSTM-RNN)、变分自动编码器 (VAE)、生成对抗网络 (GAN)、对抗自动编码器 (AAE)、进化算法和门控递归单元 (GRU-RNN)。此外,指定了八个核心挑战:生成分子库的同质性、合成能力不足、有限的测定数据、模型可解释性、无法进行多属性优化、不可比性、受限的分子大小和模型评估的不确定性。分子被编码为字符串,偶尔使用随机化进行扩充,也可以编码为 2D 图或 3D 图。进行统计分析和可视化,以说明过去五年中从头药物设计中机器学习方法的发展情况。最后,讨论了未来的机会和保留意见。
Comput Biol Med. 2022-6
J Med Internet Res. 2021-5-4
Knee Surg Sports Traumatol Arthrosc. 2022-2
Curr Med Res Opin. 2022-5
Cochrane Database Syst Rev. 2022-5-20
Health Technol Assess. 2006-9
Invest New Drugs. 2025-9-4
bioRxiv. 2025-7-17
MedComm (2020). 2025-7-30
J Chem Inf Model. 2025-7-14
Methods Mol Biol. 2025
Acc Chem Res. 2025-6-17
Curr Res Toxicol. 2025-4-21