Sagendorf Jared M, Mitra Raktim, Huang Jiawei, Chen Xiaojiang S, Rohs Remo
Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089 USA.
Present Address: Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158 USA.
Biophys Rev. 2024 Jun 26;16(3):297-314. doi: 10.1007/s12551-024-01201-w. eCollection 2024 Jun.
Protein-nucleic acid (PNA) binding plays critical roles in the transcription, translation, regulation, and three-dimensional organization of the genome. Structural models of proteins bound to nucleic acids (NA) provide insights into the chemical, electrostatic, and geometric properties of the protein structure that give rise to NA binding but are scarce relative to models of unbound proteins. We developed a deep learning approach for predicting PNA binding given the unbound structure of a protein that we call PNAbind. Our method utilizes graph neural networks to encode the spatial distribution of physicochemical and geometric properties of protein structures that are predictive of NA binding. Using global physicochemical encodings, our models predict the overall binding function of a protein, and using local encodings, they predict the location of individual NA binding residues. Our models can discriminate between specificity for DNA or RNA binding, and we show that predictions made on computationally derived protein structures can be used to gain mechanistic understanding of chemical and structural features that determine NA recognition. Binding site predictions were validated against benchmark datasets, achieving AUROC scores in the range of 0.92-0.95. We applied our models to the HIV-1 restriction factor APOBEC3G and showed that our model predictions are consistent with and help explain experimental RNA binding data.
The online version contains supplementary material available at 10.1007/s12551-024-01201-w.
蛋白质-核酸(PNA)结合在基因组的转录、翻译、调控和三维组织中起着关键作用。与核酸(NA)结合的蛋白质的结构模型为产生NA结合的蛋白质结构的化学、静电和几何特性提供了见解,但相对于未结合蛋白质的模型而言较为稀少。我们开发了一种深度学习方法,用于在已知蛋白质未结合结构的情况下预测PNA结合,我们将其称为PNAbind。我们的方法利用图神经网络对蛋白质结构的物理化学和几何特性的空间分布进行编码,这些特性可预测NA结合。使用全局物理化学编码,我们的模型预测蛋白质的整体结合功能,使用局部编码,它们预测单个NA结合残基的位置。我们的模型可以区分对DNA或RNA结合的特异性,并且我们表明对通过计算得出的蛋白质结构所做的预测可用于深入了解决定NA识别的化学和结构特征。结合位点预测针对基准数据集进行了验证,曲线下面积(AUROC)得分在0.92 - 0.95范围内。我们将我们的模型应用于HIV - 1限制因子载脂蛋白B mRNA编辑酶催化多肽样3G(APOBEC3G),并表明我们的模型预测与实验RNA结合数据一致且有助于解释这些数据。
在线版本包含可在10.1007/s12551 - 024 - 01201 - w获取的补充材料。