Lim Sangsoo, Lu Yijingxiu, Cho Chang Yun, Sung Inyoung, Kim Jungwoo, Kim Youngkuk, Park Sungjoon, Kim Sun
Bioinformatics Institute, Seoul National University, Seoul, Republic of Korea.
Department of Computer Science and Engineering, College of Engineering, Seoul National University, Seoul, Republic of Korea.
Comput Struct Biotechnol J. 2021 Mar 10;19:1541-1556. doi: 10.1016/j.csbj.2021.03.004. eCollection 2021.
There has recently been a rapid progress in computational methods for determining protein targets of small molecule drugs, which will be termed as compound protein interaction (CPI). In this review, we comprehensively review topics related to computational prediction of CPI. Data for CPI has been accumulated and curated significantly both in quantity and quality. Computational methods have become powerful ever to analyze such complex the data. Thus, recent successes in the improved quality of CPI prediction are due to use of both sophisticated computational techniques and higher quality information in the databases. The goal of this article is to provide reviews of topics related to CPI, such as data, format, representation, to computational models, so that researchers can take full advantages of these resources to develop novel prediction methods. Chemical compounds and protein data from various resources were discussed in terms of data formats and encoding schemes. For the CPI methods, we grouped prediction methods into five categories from traditional machine learning techniques to state-of-the-art deep learning techniques. In closing, we discussed emerging machine learning topics to help both experimental and computational scientists leverage the current knowledge and strategies to develop more powerful and accurate CPI prediction methods.
最近,用于确定小分子药物蛋白质靶点的计算方法取得了快速进展,这种方法将被称为化合物-蛋白质相互作用(CPI)。在这篇综述中,我们全面回顾了与CPI计算预测相关的主题。CPI的数据在数量和质量上都有了显著的积累和整理。计算方法在分析如此复杂的数据方面变得越来越强大。因此,最近CPI预测质量的提高得益于复杂计算技术的使用和数据库中更高质量的信息。本文的目的是对与CPI相关的主题进行综述,如数据、格式、表示方式以及计算模型,以便研究人员能够充分利用这些资源来开发新的预测方法。我们从数据格式和编码方案的角度讨论了来自各种资源的化合物和蛋白质数据。对于CPI方法,我们将预测方法从传统机器学习技术到最先进的深度学习技术分为五类。最后,我们讨论了新兴的机器学习主题,以帮助实验科学家和计算科学家利用当前的知识和策略来开发更强大、更准确的CPI预测方法。