Structural Bioinformatics Group, Department of Life Sciences, Imperial College London, London, SW7 2AZ, UK.
Hum Mutat. 2018 Mar;39(3):365-370. doi: 10.1002/humu.23377. Epub 2017 Dec 21.
We analyzed 563,099 common (minor allele frequency, MAF≥0.01) and rare (MAF < 0.01) genetic variants annotated in ExAC and UniProt and 26,884 disease-causing variants from ClinVar and UniProt occurring in the coding region of 17,975 human protein-coding genes. Three novel sets of genes were identified: those enriched in rare variants (n = 32 genes), in common variants (n = 282 genes), and in disease-causing variants (n = 800 genes). Genes enriched in rare variants have far greater similarities in terms of biological and network properties to genes enriched in disease-causing variants, than to genes enriched in common variants. However, in half of the genes enriched in rare variants (AOC2, MAMDC4, ANKHD1, CDC42BPB, SPAG5, TRRAP, TANC2, IQCH, USP54, SRRM2, DOPEY2, and PITPNM1), no disease-causing variants have been identified in major, publicly available databases. Thus, genetic variants in these genes are strong candidates for disease and their identification, as part of sequencing studies, should prompt further in vitro analyses.
我们分析了 ExAC 和 UniProt 中注释的 563,099 个常见(次要等位基因频率,MAF≥0.01)和罕见(MAF<0.01)遗传变异,以及 ClinVar 和 UniProt 中发生在 17,975 个人类蛋白质编码基因编码区的 26,884 个致病变异。确定了三组新的基因:罕见变异(n=32 个基因)、常见变异(n=282 个基因)和致病变异(n=800 个基因)都有富集的基因。在生物学和网络特性方面,罕见变异富集的基因与致病变异富集的基因比常见变异富集的基因更为相似。然而,在罕见变异富集的基因(AOC2、MAMDC4、ANKHD1、CDC42BPB、SPAG5、TRRAP、TANC2、IQCH、USP54、SRRM2、DOPEY2 和 PITPNM1)的一半中,在主要的公开数据库中未发现致病变异。因此,这些基因中的遗传变异是疾病的强候选因素,其鉴定作为测序研究的一部分,应促使进一步进行体外分析。