Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, 19104, USA.
J Neurodev Disord. 2022 May 23;14(1):32. doi: 10.1186/s11689-022-09442-0.
Autism spectrum disorder (ASD) is a complex neurodevelopmental condition characterized by restricted, repetitive behavior, and impaired social communication and interactions. However, significant challenges remain in diagnosing and subtyping ASD due in part to the lack of a validated, standardized vocabulary to characterize clinical phenotypic presentation of ASD. Although the human phenotype ontology (HPO) plays an important role in delineating nuanced phenotypes for rare genetic diseases, it is inadequate to capture characteristic of behavioral and psychiatric phenotypes for individuals with ASD. There is a clear need, therefore, for a well-established phenotype terminology set that can assist in characterization of ASD phenotypes from patients' clinical narratives.
To address this challenge, we used natural language processing (NLP) techniques to identify and curate ASD phenotypic terms from high-quality unstructured clinical notes in the electronic health record (EHR) on 8499 individuals with ASD, 8177 individuals with non-ASD psychiatric disorders, and 8482 individuals without a documented psychiatric disorder. We further performed dimensional reduction clustering analysis to subgroup individuals with ASD, using nonnegative matrix factorization method.
Through a note-processing pipeline that includes several steps of state-of-the-art NLP approaches, we identified 3336 ASD terms linking to 1943 unique medical concepts, which represents among the largest ASD terminology set to date. The extracted ASD terms were further organized in a formal ontology structure similar to the HPO. Clustering analysis showed that these terms could be used in a diagnostic pipeline to differentiate individuals with ASD from individuals with other psychiatric disorders.
Our ASD phenotype ontology can assist clinicians and researchers in characterizing individuals with ASD, facilitating automated diagnosis, and subtyping individuals with ASD to facilitate personalized therapeutic decision-making.
自闭症谱系障碍(ASD)是一种复杂的神经发育障碍,其特征是受限的、重复的行为,以及受损的社交沟通和互动。然而,由于缺乏经过验证的标准化词汇来描述 ASD 的临床表型表现,因此在诊断和亚分类 ASD 方面仍然存在重大挑战。尽管人类表型本体(HPO)在描绘罕见遗传疾病的细微表型方面发挥着重要作用,但它不足以捕捉 ASD 患者的行为和精神表型特征。因此,显然需要建立一个完善的表型术语集,以协助从患者的临床叙述中描述 ASD 表型。
为了解决这一挑战,我们使用自然语言处理(NLP)技术从电子健康记录(EHR)中的 8499 名 ASD 患者、8177 名非 ASD 精神障碍患者和 8482 名无记录精神障碍患者的高质量非结构化临床记录中识别和编纂 ASD 表型术语。我们进一步使用非负矩阵分解方法对 ASD 患者进行降维聚类分析。
通过包括几个最先进的 NLP 方法步骤的笔记处理管道,我们确定了 3336 个与 1943 个独特医学概念相关的 ASD 术语,这是迄今为止最大的 ASD 术语集之一。提取的 ASD 术语进一步组织成类似于 HPO 的正式本体结构。聚类分析表明,这些术语可用于诊断管道,以区分 ASD 患者和其他精神障碍患者。
我们的 ASD 表型本体可以帮助临床医生和研究人员描述 ASD 患者,促进自动化诊断,并对 ASD 患者进行亚分类,以促进个性化治疗决策。