IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2092-2110. doi: 10.1109/TCBB.2021.3069040. Epub 2022 Aug 8.
The identification of compound-protein relations (CPRs), which includes compound-protein interactions (CPIs) and compound-protein affinities (CPAs), is critical to drug development. A common method for compound-protein relation identification is the use of in vitro screening experiments. However, the number of compounds and proteins is massive, and in vitro screening experiments are labor-intensive, expensive, and time-consuming with high failure rates. Researchers have developed a computational field called virtual screening (VS) to aid experimental drug development. These methods utilize experimentally validated biological interaction information to generate datasets and use the physicochemical and structural properties of compounds and target proteins as input information to train computational prediction models. At present, deep learning has been widely used in computer vision and natural language processing and has experienced epoch-making progress. At the same time, deep learning has also been used in the field of biomedicine widely, and the prediction of CPRs based on deep learning has developed rapidly and has achieved good results. The purpose of this study is to investigate and discuss the latest applications of deep learning techniques in CPR prediction. First, we describe the datasets and feature engineering (i.e., compound and protein representations and descriptors) commonly used in CPR prediction methods. Then, we review and classify recent deep learning approaches in CPR prediction. Next, a comprehensive comparison is performed to demonstrate the prediction performance of representative methods on classical datasets. Finally, we discuss the current state of the field, including the existing challenges and our proposed future directions. We believe that this investigation will provide sufficient references and insight for researchers to understand and develop new deep learning methods to enhance CPR predictions.
化合物-蛋白质关系(CPRs)的鉴定,包括化合物-蛋白质相互作用(CPIs)和化合物-蛋白质亲和力(CPAs),对药物开发至关重要。一种常见的化合物-蛋白质关系鉴定方法是使用体外筛选实验。然而,化合物和蛋白质的数量巨大,体外筛选实验是劳动密集型的、昂贵的、耗时的,并且失败率很高。研究人员已经开发了一个称为虚拟筛选(VS)的计算领域,以辅助实验药物开发。这些方法利用经过实验验证的生物相互作用信息来生成数据集,并使用化合物和靶标蛋白质的物理化学和结构特性作为输入信息来训练计算预测模型。目前,深度学习已广泛应用于计算机视觉和自然语言处理领域,并取得了划时代的进展。同时,深度学习也已广泛应用于生物医学领域,基于深度学习的 CPR 预测发展迅速,取得了良好的效果。本研究旨在调查和讨论深度学习技术在 CPR 预测中的最新应用。首先,我们描述了 CPR 预测方法中常用的数据集和特征工程(即化合物和蛋白质表示和描述符)。然后,我们回顾和分类了 CPR 预测中的最新深度学习方法。接下来,我们进行了全面的比较,以展示代表性方法在经典数据集上的预测性能。最后,我们讨论了该领域的现状,包括当前的挑战和我们提出的未来方向。我们相信,这项研究将为研究人员提供足够的参考和见解,以了解和开发新的深度学习方法,从而增强 CPR 预测。