Beltran Antoni, Jiang Xiang'er, Shen Yue, Lehner Ben
Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
BGI Research, Changzhou, China.
Nature. 2025 Jan;637(8047):885-894. doi: 10.1038/s41586-024-08370-4. Epub 2025 Jan 8.
Missense variants that change the amino acid sequences of proteins cause one-third of human genetic diseases. Tens of millions of missense variants exist in the current human population, and the vast majority of these have unknown functional consequences. Here we present a large-scale experimental analysis of human missense variants across many different proteins. Using DNA synthesis and cellular selection experiments we quantify the effect of more than 500,000 variants on the abundance of more than 500 human protein domains. This dataset reveals that 60% of pathogenic missense variants reduce protein stability. The contribution of stability to protein fitness varies across proteins and diseases and is particularly important in recessive disorders. We combine stability measurements with protein language models to annotate functional sites across proteins. Mutational effects on stability are largely conserved in homologous domains, enabling accurate stability prediction across entire protein families using energy models. Our data demonstrate the feasibility of assaying human protein variants at scale and provides a large consistent reference dataset for clinical variant interpretation and training and benchmarking of computational methods.
改变蛋白质氨基酸序列的错义变异导致了三分之一的人类遗传疾病。目前的人类群体中存在数千万个错义变异,其中绝大多数的功能后果未知。在此,我们展示了对多种不同蛋白质的人类错义变异进行的大规模实验分析。通过DNA合成和细胞筛选实验,我们量化了超过50万个变异对500多个人类蛋白质结构域丰度的影响。该数据集表明,60%的致病性错义变异会降低蛋白质稳定性。稳定性对蛋白质适应性的贡献因蛋白质和疾病而异,在隐性疾病中尤为重要。我们将稳定性测量与蛋白质语言模型相结合,以注释蛋白质中的功能位点。对稳定性的突变效应在同源结构域中基本保守,从而能够使用能量模型对整个蛋白质家族进行准确的稳定性预测。我们的数据证明了大规模检测人类蛋白质变异的可行性,并为临床变异解释以及计算方法的训练和基准测试提供了一个大型的一致性参考数据集。