Khalife Sammy, Malliavin Thérèse, Liberti Leo
LIX, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris, Palaiseau 91128, France.
CNRS, Institut Pasteur UMR 3528, Paris 75015, France.
Bioinform Adv. 2021 Nov 29;1(1):vbab038. doi: 10.1093/bioadv/vbab038. eCollection 2021.
The structure of proteins is organized in a hierarchy among which the secondary structure elements, -helix, -strand and loop, are the basic bricks. The determination of secondary structure elements usually requires the knowledge of the whole structure. Nevertheless, in numerous experimental circumstances, the protein structure is partially known. The detection of secondary structures from these partial structures is hampered by the lack of information about connecting residues along the primary sequence.
We introduce a new methodology to estimate the secondary structure elements from the values of local distances and angles between the protein atoms. Our method uses a message passing neural network, named Sequoia, which allows the automatic prediction of secondary structure elements from the values of local distances and angles between the protein atoms. This neural network takes as input the topology of the given protein graph, where the vertices are protein residues, and the edges are weighted by values of distances and pseudo-dihedral angles generalizing the backbone angles and . Any pair of residues, independently of its covalent bonds along the primary sequence of the protein, is tagged with this distance and angle information. Sequoia permits the automatic detection of the secondary structure elements, with an 1-score larger than 80% for most of the cases, when helices and strands are predicted. In contrast to the approaches classically used in structural biology, such as DSSP, Sequoia is able to capture the variations of geometry at the interface of adjacent secondary structure element. Due to its general modeling frame, Sequoia is able to handle graphs containing only atoms, which is particularly useful on low resolution structural input and in the frame of electron microscopy development.
Sequoia source code can be found at https://github.com/Khalife/Sequoia with additional documentation.
Supplementary data are available at online.
蛋白质结构是分层组织的,其中二级结构元件,即α螺旋、β链和环,是基本组成部分。二级结构元件的确定通常需要整个结构的信息。然而,在许多实验情况下,蛋白质结构是部分已知的。由于缺乏沿一级序列连接残基的信息,从这些部分结构中检测二级结构受到阻碍。
我们引入了一种新方法,根据蛋白质原子之间的局部距离和角度值来估计二级结构元件。我们的方法使用了一个名为红杉(Sequoia)的消息传递神经网络,它可以根据蛋白质原子之间的局部距离和角度值自动预测二级结构元件。这个神经网络将给定蛋白质图的拓扑结构作为输入,其中顶点是蛋白质残基,边由距离值和推广了主链角度φ和ψ的伪二面角加权。任何一对残基,无论其沿蛋白质一级序列的共价键如何,都用这个距离和角度信息进行标记。当红杉预测α螺旋和β链时,在大多数情况下,它能够自动检测二级结构元件,F1分数大于80%。与结构生物学中经典使用的方法(如DSSP)不同,红杉能够捕捉相邻二级结构元件界面处的几何变化。由于其通用的建模框架,红杉能够处理仅包含Cα原子的图,这在低分辨率结构输入和电子显微镜发展框架中特别有用。
红杉的源代码可在https://github.com/Khalife/Sequoia上找到,并附有额外文档。
补充数据可在网上获取。