利用自然语言处理改善电子病历中克罗恩病和溃疡性结肠炎的病例定义：一种新的信息学方法。

Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach.

机构信息

Gastrointestinal Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA.

出版信息

Inflamm Bowel Dis. 2013 Jun;19(7):1411-20. doi: 10.1097/MIB.0b013e31828133fd.

DOI:10.1097/MIB.0b013e31828133fd

PMID:23567779

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3665760/

Abstract

BACKGROUND

Previous studies identifying patients with inflammatory bowel disease using administrative codes have yielded inconsistent results. Our objective was to develop a robust electronic medical record-based model for classification of inflammatory bowel disease leveraging the combination of codified data and information from clinical text notes using natural language processing.

METHODS

Using the electronic medical records of 2 large academic centers, we created data marts for Crohn's disease (CD) and ulcerative colitis (UC) comprising patients with ≥1 International Classification of Diseases, 9th edition, code for each disease. We used codified (i.e., International Classification of Diseases, 9th edition codes, electronic prescriptions) and narrative data from clinical notes to develop our classification model. Model development and validation was performed in a training set of 600 randomly selected patients for each disease with medical record review as the gold standard. Logistic regression with the adaptive LASSO penalty was used to select informative variables.

RESULTS

We confirmed 399 CD cases (67%) in the CD training set and 378 UC cases (63%) in the UC training set. For both, a combined model including narrative and codified data had better accuracy (area under the curve for CD 0.95; UC 0.94) than models using only disease International Classification of Diseases, 9th edition codes (area under the curve 0.89 for CD; 0.86 for UC). Addition of natural language processing narrative terms to our final model resulted in classification of 6% to 12% more subjects with the same accuracy.

CONCLUSIONS

Inclusion of narrative concepts identified using natural language processing improves the accuracy of electronic medical records case definition for CD and UC while simultaneously identifying more subjects compared with models using codified data alone.

摘要

背景

先前使用管理代码识别炎症性肠病患者的研究得出了不一致的结果。我们的目标是利用自然语言处理技术，结合编码数据和临床文本记录中的信息，开发一种稳健的基于电子病历的炎症性肠病分类模型。

方法

我们使用 2 家大型学术中心的电子病历创建了克罗恩病 (CD) 和溃疡性结肠炎 (UC) 的数据集市，每个疾病的数据集市都包含至少有 1 个国际疾病分类第 9 版 (ICD-9) 代码的患者。我们使用来自临床记录的编码（即 ICD-9 代码、电子处方）和叙述数据来开发我们的分类模型。在每个疾病的 600 名随机选择的患者的训练集中进行模型开发和验证，以病历审查作为金标准。使用自适应 LASSO 惩罚的逻辑回归选择信息性变量。

结果

我们在 CD 训练集中确认了 399 例 CD 病例（67%），在 UC 训练集中确认了 378 例 UC 病例（63%）。对于这两种疾病，包含叙述和编码数据的综合模型的准确性（CD 的曲线下面积为 0.95；UC 的曲线下面积为 0.94）均优于仅使用疾病 ICD-9 代码的模型（CD 的曲线下面积为 0.89；UC 的曲线下面积为 0.86）。将自然语言处理叙述术语添加到我们的最终模型中，可在不降低准确性的情况下，将分类的患者数量增加 6%至 12%。

结论

纳入使用自然语言处理识别的叙述概念可提高 CD 和 UC 的电子病历病例定义的准确性，同时与仅使用编码数据的模型相比，可识别更多的患者。

相似文献

Improving case definition of Crohn's disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach.利用自然语言处理改善电子病历中克罗恩病和溃疡性结肠炎的病例定义：一种新的信息学方法。

Inflamm Bowel Dis. 2013 Jun;19(7):1411-20. doi: 10.1097/MIB.0b013e31828133fd.

Accuracy of diagnostic codes for identifying patients with ulcerative colitis and Crohn's disease in the Veterans Affairs Health Care System.在退伍军人事务医疗保健系统中，诊断代码识别溃疡性结肠炎和克罗恩病患者的准确性。

Dig Dis Sci. 2014 Oct;59(10):2406-10. doi: 10.1007/s10620-014-3174-7. Epub 2014 May 10.

Differentiating ulcerative colitis from Crohn disease in children and young adults: report of a working group of the North American Society for Pediatric Gastroenterology, Hepatology, and Nutrition and the Crohn's and Colitis Foundation of America.儿童及青年溃疡性结肠炎与克罗恩病的鉴别：北美儿科胃肠病学、肝病学和营养学会及美国克罗恩病和结肠炎基金会工作组报告

J Pediatr Gastroenterol Nutr. 2007 May;44(5):653-74. doi: 10.1097/MPG.0b013e31805563f3.

Combined serological, genetic, and inflammatory markers differentiate non-IBD, Crohn's disease, and ulcerative colitis patients.联合血清学、遗传学和炎症标志物可区分非 IBD、克罗恩病和溃疡性结肠炎患者。

Inflamm Bowel Dis. 2013 May;19(6):1139-48. doi: 10.1097/MIB.0b013e318280b19e.

Crohn's-like clinical and pathological manifestations of giant inflammatory polyposis in IBD: a potential diagnostic pitfall.炎症性肠病中巨大炎性息肉的克罗恩病样临床和病理表现：一个潜在的诊断陷阱。

J Crohns Colitis. 2014 Jul;8(7):635-40. doi: 10.1016/j.crohns.2013.11.027. Epub 2013 Dec 22.

Clinical course during the 1st year after diagnosis in ulcerative colitis and Crohn's disease. Results of a large, prospective population-based study in southeastern Norway, 1990-93.溃疡性结肠炎和克罗恩病诊断后第一年的临床病程。1990 - 1993年挪威东南部一项基于人群的大型前瞻性研究结果。

Scand J Gastroenterol. 1997 Oct;32(10):1005-12. doi: 10.3109/00365529709011217.

Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing.使用自然语言处理验证电子病历中的银屑病关节炎诊断。

Semin Arthritis Rheum. 2011 Apr;40(5):413-20. doi: 10.1016/j.semarthrit.2010.05.002. Epub 2010 Aug 10.

The use of 1H magnetic resonance spectroscopy in inflammatory bowel diseases: distinguishing ulcerative colitis from Crohn's disease.1H磁共振波谱在炎症性肠病中的应用：溃疡性结肠炎与克罗恩病的鉴别

Am J Gastroenterol. 2001 Feb;96(2):442-8. doi: 10.1111/j.1572-0241.2001.03523.x.

Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records.利用电子健康记录纳入自然语言处理以改善轴性脊柱关节炎的分类。

Rheumatology (Oxford). 2020 May 1;59(5):1059-1065. doi: 10.1093/rheumatology/kez375.

Crohn's disease and ulcerative colitis. Occurrence, course and prognosis during the first year of disease in a European population-based inception cohort.克罗恩病和溃疡性结肠炎。欧洲一项基于人群的起始队列研究中疾病第一年的发病率、病程及预后

Dan Med J. 2014 Jan;61(1):B4778.

引用本文的文献

Identification of Cohorts with Inflammatory Bowel Disease Amidst Fragmented Clinical Databases via Machine Learning.通过机器学习在碎片化临床数据库中识别炎症性肠病队列

Dig Dis Sci. 2025 Aug 13. doi: 10.1007/s10620-025-09323-1.

The impact of artificial intelligence on the endoscopic assessment of inflammatory bowel disease-related neoplasia.人工智能对炎症性肠病相关肿瘤内镜评估的影响。

Therap Adv Gastroenterol. 2025 Jun 23;18:17562848251348574. doi: 10.1177/17562848251348574. eCollection 2025.

Emerging applications of NLP and large language models in gastroenterology and hepatology: a systematic review.自然语言处理和大语言模型在胃肠病学和肝病学中的新兴应用：一项系统综述

Front Med (Lausanne). 2025 Jan 22;11:1512824. doi: 10.3389/fmed.2024.1512824. eCollection 2024.

Large Language Models Outperform Traditional Natural Language Processing Methods in Extracting Patient-Reported Outcomes in Inflammatory Bowel Disease.在提取炎症性肠病患者报告结局方面，大语言模型优于传统自然语言处理方法。

Gastro Hep Adv. 2024 Oct 10;4(2):100563. doi: 10.1016/j.gastha.2024.10.003. eCollection 2025.

Electronic Health Records-based identification of newly diagnosed Crohn's Disease cases.基于电子健康记录识别新诊断的克罗恩病病例。

Artif Intell Med. 2025 Jan;159:103032. doi: 10.1016/j.artmed.2024.103032. Epub 2024 Nov 21.

Advancing rheumatology with natural language processing: insights and prospects from a systematic review.利用自然语言处理推动风湿病学发展：系统评价的见解与展望

Rheumatol Adv Pract. 2024 Sep 19;8(4):rkae120. doi: 10.1093/rap/rkae120. eCollection 2024.

Large language models outperform traditional natural language processing methods in extracting patient-reported outcomes in IBD.在提取炎症性肠病患者报告的结局方面，大型语言模型优于传统的自然语言处理方法。

medRxiv. 2024 Sep 6:2024.09.05.24313139. doi: 10.1101/2024.09.05.24313139.

Artificial Intelligence and the Future of Gastroenterology and Hepatology.人工智能与胃肠病学和肝病学的未来

Gastro Hep Adv. 2022 May 11;1(4):581-595. doi: 10.1016/j.gastha.2022.02.025. eCollection 2022.

Applications of natural language processing tools in the surgical journey.自然语言处理工具在手术过程中的应用。

Front Surg. 2024 May 17;11:1403540. doi: 10.3389/fsurg.2024.1403540. eCollection 2024.

Accurate, Robust, and Scalable Machine Abstraction of Mayo Endoscopic Subscores From Colonoscopy Reports.从结肠镜检查报告中准确、稳健且可扩展地进行梅奥内镜亚评分的机器抽象。

Inflamm Bowel Dis. 2025 Mar 3;31(3):665-670. doi: 10.1093/ibd/izae068.

本文引用的文献

Portability of an algorithm to identify rheumatoid arthritis in electronic health records.算法在电子健康记录中识别类风湿关节炎的可移植性。

J Am Med Inform Assoc. 2012 Jun;19(e1):e162-9. doi: 10.1136/amiajnl-2011-000583. Epub 2012 Feb 28.

Automated discovery of drug treatment patterns for endocrine therapy of breast cancer within an electronic medical record.电子病历中乳腺癌内分泌治疗药物治疗模式的自动发现。

J Am Med Inform Assoc. 2012 Jun;19(e1):e83-9. doi: 10.1136/amiajnl-2011-000295. Epub 2011 Dec 1.

Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies.FOXE1 附近的变体与甲状腺功能减退症和其他甲状腺疾病有关：利用电子病历进行全基因组和表型全基因组研究。

Am J Hum Genet. 2011 Oct 7;89(4):529-42. doi: 10.1016/j.ajhg.2011.09.008.

Drug side effect extraction from clinical narratives of psychiatry and psychology patients.从精神病学和心理学患者的临床叙述中提取药物副作用。

J Am Med Inform Assoc. 2011 Dec;18 Suppl 1(Suppl 1):i144-9. doi: 10.1136/amiajnl-2011-000351. Epub 2011 Sep 21.

The promise of electronic records: around the corner or down the road?电子记录的前景：近在咫尺还是路途遥远？

JAMA. 2011 Aug 24;306(8):880-1. doi: 10.1001/jama.2011.1219.

Automated identification of postoperative complications within an electronic medical record using natural language processing.利用自然语言处理技术在电子病历中自动识别术后并发症。

JAMA. 2011 Aug 24;306(8):848-55. doi: 10.1001/jama.2011.1204.

Using electronic health records to drive discovery in disease genomics.利用电子健康记录推动疾病基因组学的发现。

Nat Rev Genet. 2011 Jun;12(6):417-28. doi: 10.1038/nrg2999. Epub 2011 May 18.

Electronic medical records for genetic research: results of the eMERGE consortium.电子病历用于基因研究：eMERGE 联盟的研究结果。

Sci Transl Med. 2011 Apr 20;3(79):79re1. doi: 10.1126/scitranslmed.3001807.

Validity of administrative data for the diagnosis of primary sclerosing cholangitis: a population-based study.基于人群的原发性硬化性胆管炎诊断的行政数据的有效性：一项基于人群的研究。

Liver Int. 2011 May;31(5):712-20. doi: 10.1111/j.1478-3231.2011.02484.x. Epub 2011 Mar 9.

Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records.基于电子病历的多民族队列中自身抗体阳性和阴性类风湿关节炎风险的遗传基础。

Am J Hum Genet. 2011 Jan 7;88(1):57-69. doi: 10.1016/j.ajhg.2010.12.007.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验