Magielski Jan H, Ruggiero Sarah M, Xian Julie, Parthasarathy Shridhar, Galer Peter D, Ganesan Shiva, Back Amanda, McKee Jillian L, McSalley Ian, Gonzalez Alexander K, Morgan Angela, Donaher Joseph, Helbig Ingo
Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
Brain. 2025 Feb 3;148(2):663-674. doi: 10.1093/brain/awae264.
Speech and language disorders are known to have a substantial genetic contribution. Although frequently examined as components of other conditions, research on the genetic basis of linguistic differences as separate phenotypic subgroups has been limited so far. Here, we performed an in-depth characterization of speech and language disorders in 52 143 individuals, reconstructing clinical histories using a large-scale data-mining approach of the electronic medical records from an entire large paediatric healthcare network. The reported frequency of these disorders was the highest between 2 and 5 years old and spanned a spectrum of 26 broad speech and language diagnoses. We used natural language processing to assess the degree to which clinical diagnoses in full-text notes were reflected in ICD-10 diagnosis codes. We found that aphasia and speech apraxia could be retrieved easily through ICD-10 diagnosis codes, whereas stuttering as a speech phenotype was coded in only 12% of individuals through appropriate ICD-10 codes. We found significant comorbidity of speech and language disorders in neurodevelopmental conditions (30.31%) and, to a lesser degree, with epilepsies (6.07%) and movement disorders (2.05%). The most common genetic disorders retrievable in our analysis of electronic medical records were STXBP1 (n = 21), PTEN (n = 20) and CACNA1A (n = 18). When assessing associations of genetic diagnoses with specific linguistic phenotypes, we observed associations of STXBP1 and aphasia (P = 8.57 × 10-7, 95% confidence interval = 18.62-130.39) and MYO7A with speech and language development delay attributable to hearing loss (P = 1.24 × 10-5, 95% confidence interval = 17.46-infinity). Finally, in a sub-cohort of 726 individuals with whole-exome sequencing data, we identified an enrichment of rare variants in neuronal receptor pathways, in addition to associations of UQCRC1 and KIF17 with expressive aphasia, MROH8 and BCHE with poor speech, and USP37, SLC22A9 and UMODL1 with aphasia. In summary, our study outlines the landscape of paediatric speech and language disorders, confirming the phenotypic complexity of linguistic traits and novel genotype-phenotype associations. Subgroups of paediatric speech and language disorders differ significantly with respect to the composition of monogenic aetiologies.
已知言语和语言障碍有很大的遗传因素。尽管这些障碍常作为其他病症的组成部分进行研究,但迄今为止,将语言差异作为单独表型亚组的遗传基础研究仍很有限。在此,我们对52143名个体的言语和语言障碍进行了深入表征,采用大规模数据挖掘方法,从一个大型儿科医疗网络的电子病历中重建临床病史。这些障碍的报告发病率在2至5岁之间最高,涵盖了26种广泛的言语和语言诊断。我们使用自然语言处理来评估全文笔记中的临床诊断在国际疾病分类第十版(ICD - 10)诊断代码中的体现程度。我们发现失语症和言语失用症可通过ICD - 10诊断代码轻松检索到,而口吃作为一种言语表型,只有12%的个体通过适当的ICD - 10代码进行编码。我们发现神经发育病症中言语和语言障碍的共病率很高(30.31%),在癫痫(6.07%)和运动障碍(2.05%)中,共病率相对较低。在我们对电子病历的分析中,可检索到的最常见遗传病症是STXBP1(n = 21)、PTEN(n = 20)和CACNA1A(n = 18)。在评估遗传诊断与特定语言表型的关联时,我们观察到STXBP1与失语症的关联(P = 8.57×10 - 7,95%置信区间 = 18.62 - 130.39)以及MYO7A与听力损失导致的言语和语言发育迟缓的关联(P = 1.24×10 - 5,95%置信区间 = 17.46 - 无穷大)。最后,在一个有全外显子测序数据的726名个体的亚组中,我们发现除了UQCRC1和KIF17与表达性失语症、MROH8和BCHE与言语不佳、USP37、SLC22A9和UMODL1与失语症的关联外,神经元受体途径中存在罕见变异的富集。总之,我们的研究概述了儿科言语和语言障碍的情况,证实了语言特征的表型复杂性以及新的基因型 - 表型关联。儿科言语和语言障碍亚组在单基因病因组成方面存在显著差异。