Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Departments of Biophysics and Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
J Mol Biol. 2021 May 28;433(11):166915. doi: 10.1016/j.jmb.2021.166915. Epub 2021 Mar 4.
Deleterious single amino acid variation (SAV) is one of the leading causes of human diseases. Evaluating the functional impact of SAVs is crucial for diagnosis of genetic disorders. We previously developed a deep convolutional neural network predictor, DeepSAV, to evaluate the deleterious effects of SAVs on protein function based on various sequence, structural, and functional properties. DeepSAV scores of rare SAVs observed in the human population are aggregated into a gene-level score called GTS (Gene Tolerance of rare SAVs) that reflects a gene's tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. In this study, we aim to enhance the performance of DeepSAV by using expanded datasets of pathogenic and benign variants, more features, and neural network optimization. We found that multiple sequence alignments built from vertebrate-level orthologs yield better prediction results compared to those built from mammalian-level orthologs. For multiple sequence alignments built from BLAST searches, optimal performance was achieved with a sequence identify cutoff of 50% to remove distant homologs. The new version of DeepSAV exhibits the best performance among standalone predictors of deleterious effects of SAVs. We developed the DBSAV database (http://prodata.swmed.edu/DBSAV) that reports GTS scores of human genes and DeepSAV scores of SAVs in the human proteome, including pathogenic and benign SAVs, population-level SAVs, and all possible SAVs by single nucleotide variations. This database serves as a useful resource for research of human SAVs and their relationships with protein functions and human diseases.
有害的单一氨基酸变异 (SAV) 是人类疾病的主要原因之一。评估 SAV 的功能影响对于遗传疾病的诊断至关重要。我们之前开发了一种深度卷积神经网络预测器 DeepSAV,该预测器基于各种序列、结构和功能特性来评估 SAV 对蛋白质功能的有害影响。在人类群体中观察到的稀有 SAV 的 DeepSAV 得分被汇总到一个称为 GTS(稀有 SAV 的基因耐受性)的基因水平得分中,该得分反映了一个基因对有害错义突变的耐受性,并作为研究基因-疾病关联的有用工具。在这项研究中,我们旨在通过使用扩展的致病性和良性变体数据集、更多特征和神经网络优化来提高 DeepSAV 的性能。我们发现,与基于哺乳动物水平同源物构建的多重序列比对相比,基于脊椎动物水平同源物构建的多重序列比对产生了更好的预测结果。对于基于 BLAST 搜索构建的多重序列比对,最佳性能是通过使用序列同一性截止值为 50% 来去除远程同源物实现的。新版本的 DeepSAV 在 SAV 有害影响的独立预测器中表现出最佳性能。我们开发了 DBSAV 数据库(http://prodata.swmed.edu/DBSAV),该数据库报告了人类基因的 GTS 得分和人类蛋白质组中 SAV 的 DeepSAV 得分,包括致病性和良性 SAV、人群水平的 SAV 以及通过单核苷酸变异产生的所有可能的 SAV。该数据库是研究人类 SAV 及其与蛋白质功能和人类疾病关系的有用资源。