Suppr超能文献

人类中发生致病性变异的残基的溶剂可及性:从蛋白质结构到蛋白质序列

Solvent Accessibility of Residues Undergoing Pathogenic Variations in Humans: From Protein Structures to Protein Sequences.

作者信息

Savojardo Castrense, Manfredi Matteo, Martelli Pier Luigi, Casadio Rita

机构信息

Biocomputing Group, Department of Pharmacy and Biotechnologies, University of Bologna, Bologna, Italy.

Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies of the National Research Council, Bari, Italy.

出版信息

Front Mol Biosci. 2021 Jan 7;7:626363. doi: 10.3389/fmolb.2020.626363. eCollection 2020.

Abstract

Solvent accessibility (SASA) is a key feature of proteins for determining their folding and stability. SASA is computed from protein structures with different algorithms, and from protein sequences with machine-learning based approaches trained on solved structures. Here we ask the question as to which extent solvent exposure of residues can be associated to the pathogenicity of the variation. By this, SASA of the wild-type residue acquires a role in the context of functional annotation of protein single-residue variations (SRVs). By mapping variations on a curated database of human protein structures, we found that residues targeted by disease related SRVs are less accessible to solvent than residues involved in polymorphisms. The disease association is not evenly distributed among the different residue types: SRVs targeting glycine, tryptophan, tyrosine, and cysteine are more frequently disease associated than others. For all residues, the proportion of disease related SRVs largely increases when the wild-type residue is buried and decreases when it is exposed. The extent of the increase depends on the residue type. With the aid of an in house developed predictor, based on a deep learning procedure and performing at the state-of-the-art, we are able to confirm the above tendency by analyzing a large data set of residues subjected to variations and occurring in some 12,494 human protein sequences still lacking three-dimensional structure (derived from HUMSAVAR). Our data support the notion that surface accessible area is a distinguished property of residues that undergo variation and that pathogenicity is more frequently associated to the buried property than to the exposed one.

摘要

溶剂可及性(SASA)是决定蛋白质折叠和稳定性的关键特征。SASA可通过不同算法从蛋白质结构中计算得出,也可通过基于机器学习的方法从已解析结构训练的蛋白质序列中计算得出。在此,我们提出一个问题:残基的溶剂暴露程度在多大程度上与变异的致病性相关。由此,野生型残基的SASA在蛋白质单残基变异(SRV)的功能注释背景下发挥作用。通过将变异映射到一个精心策划的人类蛋白质结构数据库上,我们发现与疾病相关的SRV靶向的残基比多态性涉及的残基更不易被溶剂接触。疾病关联在不同残基类型中分布并不均匀:靶向甘氨酸、色氨酸、酪氨酸和半胱氨酸的SRV比其他的更常与疾病相关。对于所有残基,当野生型残基被掩埋时,与疾病相关的SRV比例大幅增加,而当它暴露时则降低。增加的程度取决于残基类型。借助一个内部开发的基于深度学习程序且处于当前先进水平的预测器,我们通过分析大约12494个人类蛋白质序列(源自HUMSAVAR)中发生变异的大量残基数据集,能够证实上述趋势。我们的数据支持这样一种观点,即表面可及面积是发生变异的残基的一个显著特性,并且致病性更常与掩埋特性相关,而非与暴露特性相关。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47d6/7817970/da1c3da61e92/fmolb-07-626363-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验