Data Science Division, INTAGE Healthcare Inc., 2F NREG Midosuji Bldg., 3-5-7 Kawara-Machi, Chuo-ku, Osaka 541-0048, Japan.
Business Development Division, Advanced Technology Department, INTAGE Inc., Akihabara Building, 3 Kanda-Neribeicho, Chiyoda-ku, Tokyo 101-8201, Japan.
Molecules. 2021 Aug 24;26(17):5131. doi: 10.3390/molecules26175131.
A variety of Artificial Intelligence (AI)-based (Machine Learning) techniques have been developed with regard to in silico prediction of Compound-Protein interactions (CPI)-one of which is a technique we refer to as chemical genomics-based virtual screening (CGBVS). Prediction calculations done via pairwise kernel-based support vector machine (SVM) is the main feature of CGBVS which gives high prediction accuracy, with simple implementation and easy handling. We studied whether the CGBVS technique can identify ligands for targets without ligand information (orphan targets) using data from G protein-coupled receptor (GPCR) families. As the validation method, we tested whether the ligand prediction was correct for a virtual orphan GPCR in which all ligand information for one selected target was omitted from the training data. We have specifically expressed the results of this study as applicability index and developed a method to determine whether CGBVS can be used to predict GPCR ligands. Validation results showed that the prediction accuracy of each GPCR differed greatly, but models using Multiple Sequence Alignment (MSA) as the protein descriptor performed well in terms of overall prediction accuracy. We also discovered that the effect of the type compound descriptors on the prediction accuracy was less significant than that of the type of protein descriptors used. Furthermore, we found that the accuracy of the ligand prediction depends on the amount of ligand information with regard to GPCRs related to the target. Additionally, the prediction accuracy tends to be high if a large amount of ligand information for related proteins is used in the training.
已经开发出了多种基于人工智能(AI)(机器学习)的技术,用于化合物-蛋白质相互作用(CPI)的计算机预测,其中之一是我们称之为基于化学基因组学的虚拟筛选(CGBVS)的技术。CGBVS 的主要特点是通过基于核的成对支持向量机(SVM)进行预测计算,它具有很高的预测准确性、简单的实现和易于处理的特点。我们研究了 CGBVS 技术是否可以识别没有配体信息的靶标(孤儿靶标)的配体,使用来自 G 蛋白偶联受体(GPCR)家族的数据。作为验证方法,我们测试了对于一个虚拟的孤儿 GPCR,即从训练数据中删除了一个选定靶标所有配体信息的情况下,配体预测是否正确。我们特别将本研究的结果表示为适用性指数,并开发了一种确定 CGBVS 是否可用于预测 GPCR 配体的方法。验证结果表明,每个 GPCR 的预测准确性差异很大,但使用多重序列比对(MSA)作为蛋白质描述符的模型在整体预测准确性方面表现良好。我们还发现,化合物描述符的类型对预测准确性的影响不如所使用的蛋白质描述符的类型显著。此外,我们发现配体预测的准确性取决于与靶标相关的 GPCR 的配体信息量。此外,如果在训练中使用大量相关蛋白质的配体信息,则预测准确性往往较高。