利用语义特征对患者吸烟状况进行分类。

Use of semantic features to classify patient smoking status.

作者信息

McCormick Patrick J, Elhadad Noémie, Stetson Peter D

机构信息

College of Physicians & Surgeons, Columbia University, New York, NY, USA.

出版信息

AMIA Annu Symp Proc. 2008 Nov 6;2008:450-4.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2655942/

Abstract

The recent i2b2 NLP Challenge smoking classification task offers a rare chance to compare different natural language processing techniques on actual clinical data. We compare the performance of a classifier which relies on semantic features generated by an unmodified version of MedLEE, a clinical NLP engine, to one using lexical features. We also compare the performance of supervised classifiers to rule-based symbolic classifiers. Our baseline supervised classifier with lexical features yields a microaveraged F-measure of 0.81. Our rule-based classifier using MedLEE semantic features is superior, with an F-measure of 0.83. Our supervised classifier trained with semantic MedLEE features is competitive with the top-performing smoking classifier in the i2b2 NLP Challenge, with microaveraged precision of 0.90, recall of 0.89, and F-measure of 0.89.

摘要

最近的i2b2自然语言处理挑战赛吸烟分类任务提供了一个难得的机会，可在实际临床数据上比较不同的自然语言处理技术。我们将一个依赖于临床自然语言处理引擎MedLEE未修改版本生成的语义特征的分类器的性能，与使用词汇特征的分类器进行比较。我们还将监督分类器的性能与基于规则的符号分类器进行比较。我们具有词汇特征的基线监督分类器的微平均F值为0.81。我们使用MedLEE语义特征的基于规则的分类器更胜一筹，F值为0.83。我们使用MedLEE语义特征训练的监督分类器在i2b2自然语言处理挑战赛中与表现最佳的吸烟分类器具有竞争力，微平均精度为0.90，召回率为0.89，F值为0.89。

相似文献

1

Use of semantic features to classify patient smoking status.利用语义特征对患者吸烟状况进行分类。

AMIA Annu Symp Proc. 2008 Nov 6;2008:450-4.

2

Recognition of medication information from discharge summaries using ensembles of classifiers.使用分类器集成识别出院小结中的药物信息。

BMC Med Inform Decis Mak. 2012 May 7;12:36. doi: 10.1186/1472-6947-12-36.

3

Parsing error correction of medical phrases for semantic annotation of clinical radiology reports.用于临床放射学报告语义标注的医学短语解析错误校正

AMIA Annu Symp Proc. 2008 Nov 6:1070.

4

Two approaches to assertion classification.断言分类的两种方法。

AMIA Annu Symp Proc. 2008 Nov 6;2008:752.

5

Second i2b2 workshop on natural language processing challenges for clinical records.第二届关于临床记录自然语言处理挑战的i2b2研讨会。

AMIA Annu Symp Proc. 2008 Nov 6:1252-3.

6

Automated knowledge acquisition from clinical narrative reports.从临床叙述报告中自动获取知识。

AMIA Annu Symp Proc. 2008 Nov 6;2008:783-7.

7

Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.使用带有词表示特征的结构支持向量机识别医院出院小结中的临床实体。

BMC Med Inform Decis Mak. 2013;13 Suppl 1(Suppl 1):S1. doi: 10.1186/1472-6947-13-S1-S1. Epub 2013 Apr 5.

8

ECRL: an eligibility criteria representation language based on the UMLS Semantic Network.ECRL：一种基于统一医学语言系统语义网络的合格标准表示语言。

AMIA Annu Symp Proc. 2008 Nov 6:1084.

9

Using Lexical tools to convert Unicode characters to ASCII.使用词汇工具将Unicode字符转换为ASCII码。

AMIA Annu Symp Proc. 2008 Nov 6:1031.

10

The MedDRA paradox.医学术语词典（MedDRA）悖论。

AMIA Annu Symp Proc. 2008 Nov 6;2008:470-4.

引用本文的文献

1

Explainable artificial intelligence driven insights into smoking prediction using machine learning and clinical parameters.利用机器学习和临床参数，通过可解释人工智能驱动的见解进行吸烟预测。

Sci Rep. 2025 Jul 5;15(1):24069. doi: 10.1038/s41598-025-09409-w.

2

Automated Detection of Substance-Use Status and Related Information from Clinical Text.自动检测临床文本中的物质使用状况和相关信息。

Sensors (Basel). 2022 Dec 8;22(24):9609. doi: 10.3390/s22249609.

3

Longitudinal analysis of social and behavioral determinants of health in the EHR: exploring the impact of patient trajectories and documentation practices.电子健康记录中健康的社会和行为决定因素的纵向分析：探索患者轨迹和记录实践的影响。

AMIA Annu Symp Proc. 2020 Mar 4;2019:399-407. eCollection 2019.

4

Detecting Social and Behavioral Determinants of Health with Structured and Free-Text Clinical Data.利用结构化和自由文本临床数据检测健康的社会和行为决定因素。

Appl Clin Inform. 2020 Jan;11(1):172-181. doi: 10.1055/s-0040-1702214. Epub 2020 Mar 4.

5

Generalized Extraction and Classification of Span-Level Clinical Phrases.跨度级临床短语的广义提取与分类

AMIA Annu Symp Proc. 2018 Dec 5;2018:205-214. eCollection 2018.

6

Investigating Longitudinal Tobacco Use Information from Social History and Clinical Notes in the Electronic Health Record.从电子健康记录中的社会史和临床记录调查纵向烟草使用信息。

AMIA Annu Symp Proc. 2017 Feb 10;2016:1209-1218. eCollection 2016.

7

Comparison of Three Information Sources for Smoking Information in Electronic Health Records.电子健康记录中三种吸烟信息来源的比较

Cancer Inform. 2016 Dec 8;15:237-242. doi: 10.4137/CIN.S40604. eCollection 2016.

8

Electronic medical record phenotyping using the anchor and learn framework.使用锚定与学习框架进行电子病历表型分析。

J Am Med Inform Assoc. 2016 Jul;23(4):731-40. doi: 10.1093/jamia/ocw011. Epub 2016 Apr 23.

9

Automated Extraction of Substance Use Information from Clinical Texts.从临床文本中自动提取物质使用信息。

AMIA Annu Symp Proc. 2015 Nov 5;2015:2121-30. eCollection 2015.

10

Development of phenotype algorithms using electronic medical records and incorporating natural language processing.利用电子病历并结合自然语言处理开发表型算法。

BMJ. 2015 Apr 24;350:h1885. doi: 10.1136/bmj.h1885.

本文引用的文献

1

An electronic health record based on structured narrative.基于结构化叙述的电子健康记录。

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):54-64. doi: 10.1197/jamia.M2131. Epub 2007 Oct 18.

2

Identifying patient smoking status from medical discharge records.从医疗出院记录中识别患者的吸烟状况。

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):14-24. doi: 10.1197/jamia.M2408. Epub 2007 Oct 18.

3

Five-way smoking status classification using text hot-spot identification and error-correcting output codes.使用文本热点识别和纠错输出码的五分类吸烟状态分类法

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):32-5. doi: 10.1197/jamia.M2434. Epub 2007 Oct 18.

4

Mayo clinic NLP system for patient smoking status identification.梅奥诊所用于识别患者吸烟状况的自然语言处理系统。

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):25-8. doi: 10.1197/jamia.M2437. Epub 2007 Oct 18.

5

Using implicit information to identify smoking status in smoke-blind medical discharge summaries.利用隐含信息在无吸烟记录的医疗出院小结中识别吸烟状况。

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):29-31. doi: 10.1197/jamia.M2440. Epub 2007 Oct 18.

6

Identifying smokers with a medical extraction system.使用医学提取系统识别吸烟者。

J Am Med Inform Assoc. 2008 Jan-Feb;15(1):36-9. doi: 10.1197/jamia.M2442. Epub 2007 Oct 18.

7

State-specific prevalence of current cigarette smoking among adults and secondhand smoke rules and policies in homes and workplaces--United States, 2005.2005年美国成年人当前吸烟的州特定流行率以及家庭和工作场所的二手烟规则与政策

MMWR Morb Mortal Wkly Rep. 2006 Oct 27;55(42):1148-51.

8

Extracting phenotypic information from the literature via natural language processing.通过自然语言处理从文献中提取表型信息。

Stud Health Technol Inform. 2004;107(Pt 2):758-62.

9

Automated encoding of clinical documents based on natural language processing.基于自然语言处理的临床文档自动编码

J Am Med Inform Assoc. 2004 Sep-Oct;11(5):392-402. doi: 10.1197/jamia.M1552. Epub 2004 Jun 7.

10

A general natural-language text processor for clinical radiology.一种用于临床放射学的通用自然语言文本处理器。

J Am Med Inform Assoc. 1994 Mar-Apr;1(2):161-74. doi: 10.1136/jamia.1994.95236146.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验