Large Molecule Research, Sanofi, Cambridge, MA 02141, United States.
Data & Data Science, Sanofi, Cambridge, MA 02141, United States.
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad196.
A protein can be represented in several forms, including its 1D sequence, 3D atom coordinates, and molecular surface. A protein surface contains rich structural and chemical features directly related to the protein's function such as its ability to interact with other molecules. While many methods have been developed for comparing the similarity of proteins using the sequence and structural representations, computational methods based on molecular surface representation are limited.
Here, we describe "Surface ID," a geometric deep learning system for high-throughput surface comparison based on geometric and chemical features. Surface ID offers a novel grouping and alignment algorithm useful for clustering proteins by function, visualization, and in silico screening of potential binding partners to a target molecule. Our method demonstrates top performance in surface similarity assessment, indicating great potential for protein functional annotation, a major need in protein engineering and therapeutic design.
Source code for the Surface ID model, trained weights, and inference script are available at https://github.com/Sanofi-Public/LMR-SurfaceID.
蛋白质可以有多种形式表示,包括其一维序列、三维原子坐标和分子表面。蛋白质表面包含丰富的结构和化学特征,这些特征与蛋白质的功能直接相关,例如与其他分子相互作用的能力。虽然已经开发出许多使用序列和结构表示来比较蛋白质相似性的方法,但基于分子表面表示的计算方法却受到限制。
在这里,我们描述了“Surface ID”,这是一种基于几何和化学特征的用于高通量表面比较的几何深度学习系统。Surface ID 提供了一种新颖的分组和对齐算法,可用于根据功能对蛋白质进行聚类、可视化和对靶分子的潜在结合伙伴进行计算机筛选。我们的方法在表面相似性评估方面表现出了卓越的性能,表明其在蛋白质功能注释方面具有巨大的潜力,这是蛋白质工程和治疗设计的主要需求。
Surface ID 模型的源代码、训练权重和推理脚本可在 https://github.com/Sanofi-Public/LMR-SurfaceID 上获得。