Suppr超能文献

从电子健康记录中自动提取自闭症谱系障碍的诊断标准:开发、评估与应用

Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application.

作者信息

Leroy Gondy, Gu Yang, Pettygrove Sydney, Galindo Maureen K, Arora Ananyaa, Kurzius-Spencer Margaret

机构信息

University of Arizona, Tucson, AZ, United States.

出版信息

J Med Internet Res. 2018 Nov 7;20(11):e10497. doi: 10.2196/10497.

Abstract

BACKGROUND

Electronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and review of EHRs of 4- and 8-year old children in 11 US states for the presence of ASD criteria. The work is time-consuming and expensive.

OBJECTIVE

Our objective was to automatically extract from EHRs the description of behaviors noted by the clinicians in evidence of the diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders (DSM). Previously, we reported on the classification of entire EHRs as ASD or not. In this work, we focus on the extraction of individual expressions of the different ASD criteria in the text. We intend to facilitate large-scale surveillance efforts for ASD and support analysis of changes over time as well as enable integration with other relevant data.

METHODS

We developed a natural language processing (NLP) parser to extract expressions of 12 DSM criteria using 104 patterns and 92 lexicons (1787 terms). The parser is rule-based to enable precise extraction of the entities from the text. The entities themselves are encompassed in the EHRs as very diverse expressions of the diagnostic criteria written by different people at different times (clinicians, speech pathologists, among others). Due to the sparsity of the data, a rule-based approach is best suited until larger datasets can be generated for machine learning algorithms.

RESULTS

We evaluated our rule-based parser and compared it with a machine learning baseline (decision tree). Using a test set of 6636 sentences (50 EHRs), we found that our parser achieved 76% precision, 43% recall (ie, sensitivity), and >99% specificity for criterion extraction. The performance was better for the rule-based approach than for the machine learning baseline (60% precision and 30% recall). For some individual criteria, precision was as high as 97% and recall 57%. Since precision was very high, we were assured that criteria were rarely assigned incorrectly, and our numbers presented a lower bound of their presence in EHRs. We then conducted a case study and parsed 4480 new EHRs covering 10 years of surveillance records from the Arizona Developmental Disabilities Surveillance Program. The social criteria (A1 criteria) showed the biggest change over the years. The communication criteria (A2 criteria) did not distinguish the ASD from the non-ASD records. Among behaviors and interests criteria (A3 criteria), 1 (A3b) was present with much greater frequency in the ASD than in the non-ASD EHRs.

CONCLUSIONS

Our results demonstrate that NLP can support large-scale analysis useful for ASD surveillance and research. In the future, we intend to facilitate detailed analysis and integration of national datasets.

摘要

背景

电子健康记录(EHR)为信息利用带来了诸多机遇。其中一种用途是美国疾病控制与预防中心开展的监测,以追踪自闭症谱系障碍(ASD)病例。目前,这一过程包括人工收集和审查美国11个州4岁和8岁儿童的电子健康记录,以确定是否符合ASD标准。这项工作既耗时又昂贵。

目的

我们的目标是从电子健康记录中自动提取临床医生记录的、符合《精神疾病诊断与统计手册》(DSM)诊断标准的行为描述。此前,我们报告了将整个电子健康记录分类为是否为ASD。在这项工作中,我们专注于从文本中提取不同ASD标准的个体表述。我们旨在促进对ASD的大规模监测工作,支持对随时间变化的分析,并实现与其他相关数据的整合。

方法

我们开发了一种自然语言处理(NLP)解析器,使用104种模式和92个词汇表(1787个术语)来提取12条DSM标准的表述。该解析器基于规则,以便能从文本中精确提取实体。这些实体本身在电子健康记录中表现为不同人在不同时间(临床医生、言语病理学家等)编写的诊断标准的多种表述。由于数据稀疏,在能够生成更大的数据集用于机器学习算法之前,基于规则的方法最为适用。

结果

我们评估了基于规则的解析器,并将其与机器学习基线(决策树)进行比较。使用包含6636个句子(50份电子健康记录)的测试集,我们发现我们的解析器在标准提取方面的精确率为76%,召回率(即敏感度)为43%,特异度>99%。基于规则的方法的性能优于机器学习基线(精确率60%,召回率30%)。对于某些个体标准,精确率高达97%,召回率为57%。由于精确率非常高,我们确信标准很少被错误分配,我们的数据显示了它们在电子健康记录中的存在下限。然后,我们进行了一项案例研究,解析了4480份新的电子健康记录,这些记录涵盖了亚利桑那州发育障碍监测项目10年的监测记录。社会标准(A1标准)多年来变化最大。沟通标准(A2标准)无法区分ASD和非ASD记录。在行为和兴趣标准(A3标准)中,1条标准(A3b)在ASD电子健康记录中的出现频率远高于非ASD电子健康记录。

结论

我们的结果表明,自然语言处理可以支持对ASD监测和研究有用的大规模分析。未来,我们打算促进对国家数据集的详细分析和整合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b010/6249505/d6ade3294083/jmir_v20i11e10497_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验