Suppr超能文献

利用几何注意力、分辨率间转移学习和基于同源性的增强技术来提高蛋白质结合位点预测的性能。

Turbocharging protein binding site prediction with geometric attention, inter-resolution transfer learning, and homology-based augmentation.

机构信息

Deargen, Seoul, Republic of Korea.

SK Life Science, Inc., Paramus, NJ, USA.

出版信息

BMC Bioinformatics. 2024 Sep 20;25(1):306. doi: 10.1186/s12859-024-05923-2.

Abstract

BACKGROUND

Locating small molecule binding sites in target proteins, in the resolution of either pocket or residue, is critical in many drug-discovery scenarios. Since it is not always easy to find such binding sites using conventional methods, different deep learning methods to predict binding sites out of protein structures have been developed in recent years. The existing deep learning based methods have several limitations, including (1) the inefficiency of the CNN-only architecture, (2) loss of information due to excessive post-processing, and (3) the under-utilization of available data sources.

METHODS

We present a new model architecture and training method that resolves the aforementioned problems. First, by layering geometric self-attention units on top of residue-level 3D CNN outputs, our model overcomes the problems of CNN-only architectures. Second, by configuring the fundamental units of computation as residues and pockets instead of voxels, our method reduced the information loss from post-processing. Lastly, by employing inter-resolution transfer learning and homology-based augmentation, our method maximizes the utilization of available data sources to a significant extent.

RESULTS

The proposed method significantly outperformed all state-of-the-art baselines regarding both resolutions-pocket and residue. An ablation study demonstrated the indispensability of our proposed architecture, as well as transfer learning and homology-based augmentation, for achieving optimal performance. We further scrutinized our model's performance through a case study involving human serum albumin, which demonstrated our model's superior capability in identifying multiple binding sites of the protein, outperforming the existing methods.

CONCLUSIONS

We believe that our contribution to the literature is twofold. Firstly, we introduce a novel computational method for binding site prediction with practical applications, substantiated by its strong performance across diverse benchmarks and case studies. Secondly, the innovative aspects in our method- specifically, the design of the model architecture, inter-resolution transfer learning, and homology-based augmentation-would serve as useful components for future work.

摘要

背景

在许多药物发现场景中,定位靶蛋白中小分子结合位点(无论是口袋还是残基分辨率)至关重要。由于使用传统方法并不总是容易找到这些结合位点,因此近年来已经开发出不同的深度学习方法来从蛋白质结构中预测结合位点。现有的基于深度学习的方法存在几个局限性,包括(1)CNN 架构效率低下,(2)由于过度后处理导致信息丢失,以及(3)未充分利用可用数据源。

方法

我们提出了一种新的模型架构和训练方法,可解决上述问题。首先,通过在残基级 3D CNN 输出之上分层几何自注意单元,我们的模型克服了仅 CNN 架构的问题。其次,通过将基本计算单元配置为残基和口袋而不是体素,我们的方法减少了后处理过程中的信息丢失。最后,通过采用跨分辨率转移学习和基于同源性的增强,我们的方法在很大程度上最大限度地利用了可用数据源。

结果

所提出的方法在口袋和残基分辨率方面均显著优于所有最先进的基线。通过消融研究证明了我们提出的架构以及转移学习和基于同源性的增强的不可或缺性,对于实现最佳性能至关重要。我们通过涉及人血清白蛋白的案例研究进一步仔细研究了我们模型的性能,该研究表明我们的模型在识别蛋白质的多个结合位点方面具有卓越的能力,优于现有的方法。

结论

我们相信,我们对文献的贡献有两个方面。首先,我们引入了一种新的计算方法用于结合位点预测,具有实际应用价值,并通过在各种基准测试和案例研究中的出色表现得到了验证。其次,我们方法中的创新方面——特别是模型架构的设计、跨分辨率转移学习和基于同源性的增强——将成为未来工作的有用组件。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a621/11416008/238164a5e726/12859_2024_5923_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验