Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA; Department of Biomedical and Health Informatics (DBHi), Children's Hospital of Philadelphia, Philadelphia, PA 19146, USA.
Translational and Clinical Research Institute, Newcastle University, Newcastle-upon-Tyne NE1 7RU, UK; Royal Victoria Infirmary, Newcastle-upon-Tyne NE1 4LP, UK.
Am J Hum Genet. 2020 Oct 1;107(4):683-697. doi: 10.1016/j.ajhg.2020.08.003. Epub 2020 Aug 26.
More than 100 genetic etiologies have been identified in developmental and epileptic encephalopathies (DEEs), but correlating genetic findings with clinical features at scale has remained a hurdle because of a lack of frameworks for analyzing heterogenous clinical data. Here, we analyzed 31,742 Human Phenotype Ontology (HPO) terms in 846 individuals with existing whole-exome trio data and assessed associated clinical features and phenotypic relatedness by using HPO-based semantic similarity analysis for individuals with de novo variants in the same gene. Gene-specific phenotypic signatures included associations of SCN1A with "complex febrile seizures" (HP: 0011172; p = 2.1 × 10) and "focal clonic seizures" (HP: 0002266; p = 8.9 × 10), STXBP1 with "absent speech" (HP: 0001344; p = 1.3 × 10), and SLC6A1 with "EEG with generalized slow activity" (HP: 0010845; p = 0.018). Of 41 genes with de novo variants in two or more individuals, 11 genes showed significant phenotypic similarity, including SCN1A (n = 16, p < 0.0001), STXBP1 (n = 14, p = 0.0021), and KCNB1 (n = 6, p = 0.011). Including genetic and phenotypic data of control subjects increased phenotypic similarity for all genetic etiologies, whereas the probability of observing de novo variants decreased, emphasizing the conceptual differences between semantic similarity analysis and approaches based on the expected number of de novo events. We demonstrate that HPO-based phenotype analysis captures unique profiles for distinct genetic etiologies, reflecting the breadth of the phenotypic spectrum in genetic epilepsies. Semantic similarity can be used to generate statistical evidence for disease causation analogous to the traditional approach of primarily defining disease entities through similar clinical features.
在发育性和癫痫性脑病 (DEE) 中已经确定了 100 多种遗传病因,但由于缺乏分析异质临床数据的框架,将遗传发现与大规模临床特征相关联一直是一个障碍。在这里,我们分析了 846 名具有现有外显子组三体型数据的个体中的 31742 个人类表型本体论 (HPO) 术语,并通过使用基于 HPO 的语义相似性分析来评估个体的相关临床特征和表型相关性在同一基因中有新生变体的个体。特定基因的表型特征包括 SCN1A 与“复杂热性惊厥”(HP:0011172;p=2.1×10)和“局灶性强直阵挛发作”(HP:0002266;p=8.9×10)的关联,STXBP1 与“言语缺失”(HP:0001344;p=1.3×10)的关联,以及 SLC6A1 与“EEG 具有广泛慢活动”(HP:0010845;p=0.018)的关联。在有两个或更多个体有新生变体的 41 个基因中,有 11 个基因表现出显著的表型相似性,包括 SCN1A(n=16,p<0.0001)、STXBP1(n=14,p=0.0021)和 KCNB1(n=6,p=0.011)。包括对照个体的遗传和表型数据增加了所有遗传病因的表型相似性,而观察到新生变体的概率降低,强调了语义相似性分析和基于预期新生事件数量的方法之间的概念差异。我们证明,基于 HPO 的表型分析为不同的遗传病因捕捉到独特的特征,反映了遗传癫痫中表型谱的广度。语义相似性可用于生成类似于主要通过相似临床特征定义疾病实体的传统方法的疾病因果关系的统计证据。