Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, USA.
Stanley Center for Psychiatric Research, The Broad Institute of Harvard and M.I.T, Cambridge, MA, USA.
Genome Med. 2020 Mar 17;12(1):28. doi: 10.1186/s13073-020-00725-6.
Classifying pathogenicity of missense variants represents a major challenge in clinical practice during the diagnoses of rare and genetic heterogeneous neurodevelopmental disorders (NDDs). While orthologous gene conservation is commonly employed in variant annotation, approximately 80% of known disease-associated genes belong to gene families. The use of gene family information for disease gene discovery and variant interpretation has not yet been investigated on a genome-wide scale. We empirically evaluate whether paralog-conserved or non-conserved sites in human gene families are important in NDDs.
Gene family information was collected from Ensembl. Paralog-conserved sites were defined based on paralog sequence alignments; 10,068 NDD patients and 2078 controls were statistically evaluated for de novo variant burden in gene families.
We demonstrate that disease-associated missense variants are enriched at paralog-conserved sites across all disease groups and inheritance models tested. We developed a gene family de novo enrichment framework that identified 43 exome-wide enriched gene families including 98 de novo variant carrying genes in NDD patients of which 28 represent novel candidate genes for NDD which are brain expressed and under evolutionary constraint.
This study represents the first method to incorporate gene family information into a statistical framework to interpret variant data for NDDs and to discover new NDD-associated genes.
在罕见和遗传异质性神经发育障碍 (NDD) 的诊断中,对错义变异的致病性进行分类是临床实践中的一项重大挑战。虽然在变异注释中通常使用同源基因保守性,但约 80%的已知疾病相关基因属于基因家族。尚未在全基因组范围内研究基因家族信息在疾病基因发现和变异解释中的应用。我们通过经验评估人类基因家族中的直系同源保守或非保守位点在 NDD 中的重要性。
从 Ensembl 收集基因家族信息。根据直系同源序列比对定义直系同源保守位点;对 10068 名 NDD 患者和 2078 名对照进行基因家族中新发变异负担的统计学评估。
我们证明,在所有测试的疾病组和遗传模式中,疾病相关的错义变异在直系同源保守位点富集。我们开发了一种基因家族新发富集框架,该框架确定了 43 个全外显子富集的基因家族,包括 98 个 NDD 患者携带新发变异的基因,其中 28 个代表 NDD 的新候选基因,这些基因在大脑中表达并受到进化约束。
这项研究代表了第一种将基因家族信息纳入统计框架以解释 NDD 变异数据并发现新的 NDD 相关基因的方法。