从电子健康记录中自动提取自闭症谱系障碍的诊断标准：开发、评估与应用

Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application.

作者信息

Leroy Gondy, Gu Yang, Pettygrove Sydney, Galindo Maureen K, Arora Ananyaa, Kurzius-Spencer Margaret

机构信息

University of Arizona, Tucson, AZ, United States.

出版信息

J Med Internet Res. 2018 Nov 7;20(11):e10497. doi: 10.2196/10497.

DOI:10.2196/10497

PMID:30404767

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6249505/

Abstract

BACKGROUND

Electronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and review of EHRs of 4- and 8-year old children in 11 US states for the presence of ASD criteria. The work is time-consuming and expensive.

OBJECTIVE

Our objective was to automatically extract from EHRs the description of behaviors noted by the clinicians in evidence of the diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders (DSM). Previously, we reported on the classification of entire EHRs as ASD or not. In this work, we focus on the extraction of individual expressions of the different ASD criteria in the text. We intend to facilitate large-scale surveillance efforts for ASD and support analysis of changes over time as well as enable integration with other relevant data.

METHODS

We developed a natural language processing (NLP) parser to extract expressions of 12 DSM criteria using 104 patterns and 92 lexicons (1787 terms). The parser is rule-based to enable precise extraction of the entities from the text. The entities themselves are encompassed in the EHRs as very diverse expressions of the diagnostic criteria written by different people at different times (clinicians, speech pathologists, among others). Due to the sparsity of the data, a rule-based approach is best suited until larger datasets can be generated for machine learning algorithms.

RESULTS

We evaluated our rule-based parser and compared it with a machine learning baseline (decision tree). Using a test set of 6636 sentences (50 EHRs), we found that our parser achieved 76% precision, 43% recall (ie, sensitivity), and >99% specificity for criterion extraction. The performance was better for the rule-based approach than for the machine learning baseline (60% precision and 30% recall). For some individual criteria, precision was as high as 97% and recall 57%. Since precision was very high, we were assured that criteria were rarely assigned incorrectly, and our numbers presented a lower bound of their presence in EHRs. We then conducted a case study and parsed 4480 new EHRs covering 10 years of surveillance records from the Arizona Developmental Disabilities Surveillance Program. The social criteria (A1 criteria) showed the biggest change over the years. The communication criteria (A2 criteria) did not distinguish the ASD from the non-ASD records. Among behaviors and interests criteria (A3 criteria), 1 (A3b) was present with much greater frequency in the ASD than in the non-ASD EHRs.

CONCLUSIONS

Our results demonstrate that NLP can support large-scale analysis useful for ASD surveillance and research. In the future, we intend to facilitate detailed analysis and integration of national datasets.

摘要

背景

电子健康记录（EHR）为信息利用带来了诸多机遇。其中一种用途是美国疾病控制与预防中心开展的监测，以追踪自闭症谱系障碍（ASD）病例。目前，这一过程包括人工收集和审查美国11个州4岁和8岁儿童的电子健康记录，以确定是否符合ASD标准。这项工作既耗时又昂贵。

目的

我们的目标是从电子健康记录中自动提取临床医生记录的、符合《精神疾病诊断与统计手册》（DSM）诊断标准的行为描述。此前，我们报告了将整个电子健康记录分类为是否为ASD。在这项工作中，我们专注于从文本中提取不同ASD标准的个体表述。我们旨在促进对ASD的大规模监测工作，支持对随时间变化的分析，并实现与其他相关数据的整合。

方法

我们开发了一种自然语言处理（NLP）解析器，使用104种模式和92个词汇表（1787个术语）来提取12条DSM标准的表述。该解析器基于规则，以便能从文本中精确提取实体。这些实体本身在电子健康记录中表现为不同人在不同时间（临床医生、言语病理学家等）编写的诊断标准的多种表述。由于数据稀疏，在能够生成更大的数据集用于机器学习算法之前，基于规则的方法最为适用。

结果

我们评估了基于规则的解析器，并将其与机器学习基线（决策树）进行比较。使用包含6636个句子（50份电子健康记录）的测试集，我们发现我们的解析器在标准提取方面的精确率为76%，召回率（即敏感度）为43%，特异度>99%。基于规则的方法的性能优于机器学习基线（精确率60%，召回率30%）。对于某些个体标准，精确率高达97%，召回率为57%。由于精确率非常高，我们确信标准很少被错误分配，我们的数据显示了它们在电子健康记录中的存在下限。然后，我们进行了一项案例研究，解析了4480份新的电子健康记录，这些记录涵盖了亚利桑那州发育障碍监测项目10年的监测记录。社会标准（A1标准）多年来变化最大。沟通标准（A2标准）无法区分ASD和非ASD记录。在行为和兴趣标准（A3标准）中，1条标准（A3b）在ASD电子健康记录中的出现频率远高于非ASD电子健康记录。

结论

我们的结果表明，自然语言处理可以支持对ASD监测和研究有用的大规模分析。未来，我们打算促进对国家数据集的详细分析和整合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b010/6249505/d6ade3294083/jmir_v20i11e10497_fig1.jpg

相似文献

Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application.从电子健康记录中自动提取自闭症谱系障碍的诊断标准：开发、评估与应用

J Med Internet Res. 2018 Nov 7;20(11):e10497. doi: 10.2196/10497.

Prevalence and Characteristics of Autism Spectrum Disorder Among Children Aged 4 Years - Early Autism and Developmental Disabilities Monitoring Network, Seven Sites, United States, 2010, 2012, and 2014.4 岁儿童自闭症谱系障碍的流行率和特征——早期自闭症和发育障碍监测网络，美国七个地点，2010、2012 和 2014 年。

MMWR Surveill Summ. 2019 Apr 12;68(2):1-19. doi: 10.15585/mmwr.ss6802a1.

Prevalence of Autism Spectrum Disorder Among Children Aged 8 Years - Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2014.8 岁儿童自闭症谱系障碍患病率 - 自闭症及发育障碍监测网，美国 11 个监测点，2014 年。

MMWR Surveill Summ. 2018 Apr 27;67(6):1-23. doi: 10.15585/mmwr.ss6706a1.

Transparent deep learning to identify autism spectrum disorders (ASD) in EHR using clinical notes.利用电子健康记录中的临床记录进行透明的深度学习以识别自闭症谱系障碍（ASD）。

J Am Med Inform Assoc. 2024 May 20;31(6):1313-1321. doi: 10.1093/jamia/ocae080.

Early Identification of Autism Spectrum Disorder Among Children Aged 4 Years - Early Autism and Developmental Disabilities Monitoring Network, Six Sites, United States, 2016.4岁儿童自闭症谱系障碍的早期识别——早期自闭症与发育障碍监测网络，美国六个地点，2016年

MMWR Surveill Summ. 2020 Mar 27;69(3):1-11. doi: 10.15585/mmwr.ss6903a1.

Prevalence and Characteristics of Autism Spectrum Disorder Among Children Aged 8 Years--Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2012.8岁儿童自闭症谱系障碍的患病率及特征——自闭症与发育障碍监测网络，美国11个地点，2012年

MMWR Surveill Summ. 2016 Apr 1;65(3):1-23. doi: 10.15585/mmwr.ss6503a1.

Prevalence and Characteristics of Autism Spectrum Disorder Among Children Aged 8 Years - Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2012.8 岁儿童自闭症谱系障碍的流行率和特征 - 自闭症和发育障碍监测网络，美国 11 个地点，2012 年。

MMWR Surveill Summ. 2018 Nov 16;65(13):1-23. doi: 10.15585/mmwr.ss6513a1.

Prevalence of autism spectrum disorders--Autism and Developmental Disabilities Monitoring Network, 14 sites, United States, 2008.自闭症谱系障碍的流行率——自闭症及发展障碍监测网络，美国 14 个监测点，2008 年。

MMWR Surveill Summ. 2012 Mar 30;61(3):1-19.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Comparison of autism spectrum disorder surveillance status based on two different diagnostic schemes: Findings from the Metropolitan Atlanta Developmental Disabilities Surveillance Program, 2012.基于两种不同诊断方案的自闭症谱系障碍监测状况比较：来自 2012 年亚特兰大都会发展障碍监测计划的结果。

PLoS One. 2018 Nov 30;13(11):e0208079. doi: 10.1371/journal.pone.0208079. eCollection 2018.

引用本文的文献

RAGing ahead in rheumatology: new language model architectures to tame artificial intelligence.风湿病学领域的飞速发展：用于驾驭人工智能的新型语言模型架构

Ther Adv Musculoskelet Dis. 2025 Apr 21;17:1759720X251331529. doi: 10.1177/1759720X251331529. eCollection 2025.

Predicting neurodevelopmental disorders using machine learning models and electronic health records - status of the field.使用机器学习模型和电子健康记录预测神经发育障碍 - 领域现状。

J Neurodev Disord. 2024 Nov 15;16(1):63. doi: 10.1186/s11689-024-09579-0.

MONDEP: A unified SpatioTemporal MONitoring Framework for National DEPression Forecasting.MONDEP：用于国家抑郁症预测的统一时空监测框架

Heliyon. 2024 Aug 28;10(17):e36877. doi: 10.1016/j.heliyon.2024.e36877. eCollection 2024 Sep 15.

Coding of Childhood Psychiatric and Neurodevelopmental Disorders in Electronic Health Records of a Large Integrated Health Care System: Validation Study.大型综合医疗保健系统中电子健康记录中儿童精神和神经发育障碍的编码：验证研究。

JMIR Ment Health. 2024 May 14;11:e56812. doi: 10.2196/56812.

A Prediction Model of Autism Spectrum Diagnosis from Well-Baby Electronic Data Using Machine Learning.利用机器学习从婴儿电子数据中进行自闭症谱系诊断的预测模型

Children (Basel). 2024 Apr 3;11(4):429. doi: 10.3390/children11040429.

J Am Med Inform Assoc. 2024 May 20;31(6):1313-1321. doi: 10.1093/jamia/ocae080.

Development of a real-world database for asthma and COPD: The SingHealth-Duke-NUS-GSK COPD and Asthma Real-World Evidence (SDG-CARE) collaboration.开发一个真实世界的哮喘和 COPD 数据库：新加坡保健集团-杜克-诺华-葛兰素史克 COPD 和哮喘真实世界证据（SDG-CARE）合作。

BMC Med Inform Decis Mak. 2023 Jan 9;23(1):4. doi: 10.1186/s12911-022-02071-6.

Development and Evaluation of a Natural Language Processing Annotation Tool to Facilitate Phenotyping of Cognitive Status in Electronic Health Records: Diagnostic Study.开发和评估一种自然语言处理标注工具以促进电子健康记录中认知状态的表型分析：诊断研究。

J Med Internet Res. 2022 Aug 30;24(8):e40384. doi: 10.2196/40384.

Characterization of time-variant and time-invariant assessment of suicidality on Reddit using C-SSRS.使用 C-SSRS 对 Reddit 上的自杀意念进行时变和时不变评估的特征描述。

PLoS One. 2021 May 17;16(5):e0250448. doi: 10.1371/journal.pone.0250448. eCollection 2021.

Machine Learning and Natural Language Processing in Mental Health: Systematic Review.机器学习和自然语言处理在心理健康中的应用：系统综述。

J Med Internet Res. 2021 May 4;23(5):e15708. doi: 10.2196/15708.

本文引用的文献

Secular changes in the symptom level of clinically diagnosed autism.临床诊断自闭症症状水平的长期变化。

J Child Psychol Psychiatry. 2018 Jul;59(7):744-751. doi: 10.1111/jcpp.12864. Epub 2018 Jan 29.

Clinical information extraction applications: A literature review.临床信息提取应用：文献综述。

J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.

Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.用于捕获和标准化非结构化临床信息的自然语言处理系统：一项系统综述。

J Biomed Inform. 2017 Sep;73:14-29. doi: 10.1016/j.jbi.2017.07.012. Epub 2017 Jul 17.

Brief Report: What Drives Parental Concerns About Their 18-Month-Olds at Familial Risk for Autism Spectrum Disorder?简短报告：是什么导致有自闭症谱系障碍家族风险的父母对其18个月大的孩子感到担忧？

J Autism Dev Disord. 2017 May;47(5):1535-1541. doi: 10.1007/s10803-017-3060-1.

Development of a Machine Learning Algorithm for the Surveillance of Autism Spectrum Disorder.一种用于监测自闭症谱系障碍的机器学习算法的开发

PLoS One. 2016 Dec 21;11(12):e0168224. doi: 10.1371/journal.pone.0168224. eCollection 2016.

Dev Neurorehabil. 2017 May;20(4):228-235. doi: 10.1080/17518423.2016.1211186. Epub 2016 Aug 11.

Electronic Health Record Based Algorithm to Identify Patients with Autism Spectrum Disorder.基于电子健康记录的自闭症谱系障碍患者识别算法

PLoS One. 2016 Jul 29;11(7):e0159621. doi: 10.1371/journal.pone.0159621. eCollection 2016.

MMWR Surveill Summ. 2016 Apr 1;65(3):1-23. doi: 10.15585/mmwr.ss6503a1.

Constructing a molecular interaction network for thyroid cancer via large-scale text mining of gene and pathway events.通过对基因和通路事件进行大规模文本挖掘构建甲状腺癌分子相互作用网络。

BMC Syst Biol. 2015;9 Suppl 6(Suppl 6):S5. doi: 10.1186/1752-0509-9-S6-S5. Epub 2015 Dec 9.

Semantic mapping reveals distinct patterns in descriptions of social relations in adults with autism spectrum disorder.语义映射揭示了自闭症谱系障碍成年人社交关系描述中的不同模式。

Autism Res. 2016 Aug;9(8):846-53. doi: 10.1002/aur.1581. Epub 2015 Nov 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从电子健康记录中自动提取自闭症谱系障碍的诊断标准：开发、评估与应用

Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献