Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA.
J Mol Biol. 2021 May 28;433(11):166840. doi: 10.1016/j.jmb.2021.166840. Epub 2021 Feb 2.
Numerous human diseases are caused by mutations in genomic sequences. Since amino acid changes affect protein function through mechanisms often predictable from protein structure, the integration of structural and sequence data enables us to estimate with greater accuracy whether and how a given mutation will lead to disease. Publicly available annotated databases enable hypothesis assessment and benchmarking of prediction tools. However, the results are often presented as summary statistics or black box predictors, without providing full descriptive information. We developed a new semi-manually curated human variant database presenting information on the protein contact-map, sequence-to-structure mapping, amino acid identity change, and stability prediction for the popular UniProt database. We found that the profiles of pathogenic and benign missense polymorphisms can be effectively deduced using decision trees and comparative analyses based on the presented dataset. The database is made publicly available through https://zhanglab.ccmb.med.umich.edu/ADDRESS.
许多人类疾病是由基因组序列中的突变引起的。由于氨基酸变化通过通常可从蛋白质结构预测的机制影响蛋白质功能,因此结构和序列数据的整合使我们能够更准确地估计给定的突变是否以及如何导致疾病。公开提供的带注释数据库可用于评估假设和基准预测工具。然而,结果通常以汇总统计数据或黑盒预测器的形式呈现,而没有提供完整的描述性信息。我们开发了一个新的半自动编目人类变异数据库,为流行的 UniProt 数据库提供有关蛋白质接触图、序列到结构映射、氨基酸身份变化和稳定性预测的信息。我们发现,使用决策树和基于所提供数据集的比较分析,可以有效地推断出致病性和良性错义多态性的特征。该数据库通过 https://zhanglab.ccmb.med.umich.edu/ADDRESS 公开提供。