Möller Lukas, Guerci Lorenzo, Isert Clemens, Atz Kenneth, Schneider Gisbert
Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 4, 8093, Zurich, Switzerland.
ETH Singapore SEC Ltd., 1 CREATE Way, Singapore.
Mol Inform. 2022 Oct;41(10):e2200059. doi: 10.1002/minf.202200059. Epub 2022 Jun 1.
Identifying druggable ligand-binding sites on the surface of the macromolecular targets is an important process in structure-based drug discovery. Deep-learning models have been shown to successfully predict ligand-binding sites of proteins. As a step toward predicting binding sites in RNA and RNA-protein complexes, we employ three-dimensional convolutional neural networks. We introduce a dataset splitting approach to minimize structure-related bias in training data, and investigate the influence of protein-based neural network pre-training before fine-tuning on RNA structures. Models that were pre-trained on proteins considerably outperformed the models that were trained exclusively on RNA structures. Overall, 71 % of the known RNA binding sites were correctly located within 4 Å of their true centres.
识别大分子靶点表面可成药的配体结合位点是基于结构的药物发现中的一个重要过程。深度学习模型已被证明能够成功预测蛋白质的配体结合位点。作为预测RNA和RNA-蛋白质复合物中结合位点的第一步,我们采用了三维卷积神经网络。我们引入了一种数据集划分方法,以尽量减少训练数据中与结构相关的偏差,并研究在对RNA结构进行微调之前基于蛋白质的神经网络预训练的影响。在蛋白质上进行预训练的模型明显优于仅在RNA结构上进行训练的模型。总体而言,71%的已知RNA结合位点被正确定位在其真实中心的4埃范围内。