从医疗出院记录中识别患者的吸烟状况。

Identifying patient smoking status from medical discharge records.

作者信息

Uzuner Ozlem, Goldstein Ira, Luo Yuan, Kohane Isaac

机构信息

University at Albany, SUNY, Draper 114A, 135 Western Avenue, Albany, NY 12222, USA.

出版信息

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):14-24. doi: 10.1197/jamia.M2408. Epub 2007 Oct 18.

DOI:10.1197/jamia.M2408

PMID:17947624

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2274873/

Abstract

The authors organized a Natural Language Processing (NLP) challenge on automatically determining the smoking status of patients from information found in their discharge records. This challenge was issued as a part of the i2b2 (Informatics for Integrating Biology to the Bedside) project, to survey, facilitate, and examine studies in medical language understanding for clinical narratives. This article describes the smoking challenge, details the data and the annotation process, explains the evaluation metrics, discusses the characteristics of the systems developed for the challenge, presents an analysis of the results of received system runs, draws conclusions about the state of the art, and identifies directions for future research. A total of 11 teams participated in the smoking challenge. Each team submitted up to three system runs, providing a total of 23 submissions. The submitted system runs were evaluated with microaveraged and macroaveraged precision, recall, and F-measure. The systems submitted to the smoking challenge represented a variety of machine learning and rule-based algorithms. Despite the differences in their approaches to smoking status identification, many of these systems provided good results. There were 12 system runs with microaveraged F-measures above 0.84. Analysis of the results highlighted the fact that discharge summaries express smoking status using a limited number of textual features (e.g., "smok", "tobac", "cigar", Social History, etc.). Many of the effective smoking status identifiers benefit from these features.

摘要

作者组织了一场自然语言处理（NLP）挑战赛，旨在根据患者出院记录中的信息自动确定其吸烟状况。作为i2b2（从生物学整合到床边的信息学）项目的一部分，发起了这项挑战赛，以调查、促进和检验医学语言理解方面针对临床叙述的研究。本文描述了吸烟状况挑战赛，详细介绍了数据和标注过程，解释了评估指标，讨论了为该挑战赛开发的系统的特点，对收到的系统运行结果进行了分析，得出了当前技术水平的结论，并确定了未来研究的方向。共有11个团队参加了吸烟状况挑战赛。每个团队最多提交三次系统运行结果，总共提交了23份。提交的系统运行结果通过微观平均和宏观平均的精确率、召回率和F值进行评估。提交给吸烟状况挑战赛的系统代表了各种机器学习和基于规则的算法。尽管它们在识别吸烟状况的方法上存在差异，但其中许多系统都取得了不错的结果。有12次系统运行的微观平均F值高于0.84。结果分析突出了这样一个事实，即出院小结使用有限的文本特征（如“smok”“tobac”“cigar”、社会史等）来表达吸烟状况。许多有效的吸烟状况识别器都受益于这些特征。

相似文献

Identifying patient smoking status from medical discharge records.

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):14-24. doi: 10.1197/jamia.M2408. Epub 2007 Oct 18.

Evaluating the state-of-the-art in automatic de-identification.

J Am Med Inform Assoc. 2007 Sep-Oct;14(5):550-63. doi: 10.1197/jamia.M2444. Epub 2007 Jun 28.

Recognizing obesity and comorbidities in sparse data.

J Am Med Inform Assoc. 2009 Jul-Aug;16(4):561-70. doi: 10.1197/jamia.M3115. Epub 2009 Apr 23.

Use of semantic features to classify patient smoking status.

AMIA Annu Symp Proc. 2008 Nov 6;2008:450-4.

A system for classifying disease comorbidity status from medical discharge summaries using automated hotspot and negated concept detection.

J Am Med Inform Assoc. 2009 Jul-Aug;16(4):590-5. doi: 10.1197/jamia.M3095. Epub 2009 Apr 23.

Five-way smoking status classification using text hot-spot identification and error-correcting output codes.

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):32-5. doi: 10.1197/jamia.M2434. Epub 2007 Oct 18.

Mayo clinic NLP system for patient smoking status identification.

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):25-8. doi: 10.1197/jamia.M2437. Epub 2007 Oct 18.

A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):601-6. doi: 10.1136/amiajnl-2011-000163. Epub 2011 Apr 20.

Medical i2b2 NLP smoking challenge: the A-Life system architecture and methodology.

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):40-3. doi: 10.1197/jamia.M2438. Epub 2007 Oct 18.

A text mining approach to the prediction of disease status from clinical discharge summaries.

J Am Med Inform Assoc. 2009 Jul-Aug;16(4):596-600. doi: 10.1197/jamia.M3096. Epub 2009 Apr 23.

引用本文的文献

Keyword-optimized template insertion for clinical note classification via prompt-based learning.

BMC Med Inform Decis Mak. 2025 Jul 3;25(1):247. doi: 10.1186/s12911-025-03071-y.

SmokeBERT: A BERT-based Model for Quantitative Smoking History Extraction from Clinical Narratives to Improve Lung Cancer Screening.

medRxiv. 2025 Jun 20:2025.06.18.25329870. doi: 10.1101/2025.06.18.25329870.

Secondary Use of Clinical Problem List Descriptions for Bi-Encoder Based ICD-10 Classification.

AMIA Annu Symp Proc. 2025 May 22;2024:620-627. eCollection 2024.

Question Answering for Electronic Health Records: Scoping Review of Datasets and Models.

J Med Internet Res. 2024 Oct 30;26:e53636. doi: 10.2196/53636.

Identification of patients' smoking status using an explainable AI approach: a Danish electronic health records case study.

BMC Med Res Methodol. 2024 May 17;24(1):114. doi: 10.1186/s12874-024-02231-4.

Development of a social and environmental determinants of health informatics maturity model.

J Clin Transl Sci. 2023 Dec 7;7(1):e266. doi: 10.1017/cts.2023.691. eCollection 2023.

Natural Language Processing for Radiation Oncology: Personalizing Treatment Pathways.

Pharmgenomics Pers Med. 2024 Feb 13;17:65-76. doi: 10.2147/PGPM.S396971. eCollection 2024.

Text Classification of Cancer Clinical Trial Eligibility Criteria.

AMIA Annu Symp Proc. 2024 Jan 11;2023:1304-1313. eCollection 2023.

Capturing Individual-level Social Determinants from Clinical Text.

AMIA Annu Symp Proc. 2024 Jan 11;2023:484-493. eCollection 2023.

Impact of possible errors in natural language processing-derived data on downstream epidemiologic analysis.

JAMIA Open. 2023 Dec 27;6(4):ooad111. doi: 10.1093/jamiaopen/ooad111. eCollection 2023 Dec.

本文引用的文献

Five-way smoking status classification using text hot-spot identification and error-correcting output codes.

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):32-5. doi: 10.1197/jamia.M2434. Epub 2007 Oct 18.

Mayo clinic NLP system for patient smoking status identification.

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):25-8. doi: 10.1197/jamia.M2437. Epub 2007 Oct 18.

Medical i2b2 NLP smoking challenge: the A-Life system architecture and methodology.

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):40-3. doi: 10.1197/jamia.M2438. Epub 2007 Oct 18.

Using implicit information to identify smoking status in smoke-blind medical discharge summaries.

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):29-31. doi: 10.1197/jamia.M2440. Epub 2007 Oct 18.

Identifying smokers with a medical extraction system.

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):36-9. doi: 10.1197/jamia.M2442. Epub 2007 Oct 18.

Evaluating the state-of-the-art in automatic de-identification.

J Am Med Inform Assoc. 2007 Sep-Oct;14(5):550-63. doi: 10.1197/jamia.M2444. Epub 2007 Jun 28.

A suite of natural language processing tools developed for the I2B2 project.

AMIA Annu Symp Proc. 2006;2006:931.

Syntactically-informed semantic category recognition in discharge summaries.

AMIA Annu Symp Proc. 2006;2006:714-8.

Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system.

BMC Med Inform Decis Mak. 2006 Jul 26;6:30. doi: 10.1186/1472-6947-6-30.

Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues.

BMC Bioinformatics. 2006 Jul 5;7:334. doi: 10.1186/1471-2105-7-334.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从医疗出院记录中识别患者的吸烟状况。

Identifying patient smoking status from medical discharge records.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献