一项关于现有吸烟状态检测模块在不同机构间可移植性的研究。

A study of transportability of an existing smoking status detection module across institutions.

作者信息

Liu Mei, Shah Anushi, Jiang Min, Peterson Neeraja B, Dai Qi, Aldrich Melinda C, Chen Qingxia, Bowton Erica A, Liu Hongfang, Denny Joshua C, Xu Hua

机构信息

Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA.

出版信息

AMIA Annu Symp Proc. 2012;2012:577-86. Epub 2012 Nov 3.

PMID:23304330

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3540509/

Abstract

Electronic Medical Records (EMRs) are valuable resources for clinical observational studies. Smoking status of a patient is one of the key factors for many diseases, but it is often embedded in narrative text. Natural language processing (NLP) systems have been developed for this specific task, such as the smoking status detection module in the clinical Text Analysis and Knowledge Extraction System (cTAKES). This study examined transportability of the smoking module in cTAKES on the Vanderbilt University Hospital's EMR data. Our evaluation demonstrated that modest effort of change is necessary to achieve desirable performance. We modified the system by filtering notes, annotating new data for training the machine learning classifier, and adding rules to the rule-based classifiers. Our results showed that the customized module achieved significantly higher F-measures at all levels of classification (i.e., sentence, document, patient) compared to the direct application of the cTAKES module to the Vanderbilt data.

摘要

电子病历（EMR）是临床观察性研究的宝贵资源。患者的吸烟状况是许多疾病的关键因素之一，但它常常嵌入在叙述性文本中。针对这一特定任务已开发出自然语言处理（NLP）系统，比如临床文本分析与知识提取系统（cTAKES）中的吸烟状况检测模块。本研究考察了cTAKES中吸烟模块在范德堡大学医院电子病历数据上的可移植性。我们的评估表明，要实现理想的性能需要付出适度的修改努力。我们通过筛选记录、为训练机器学习分类器标注新数据以及向基于规则的分类器添加规则来对系统进行修改。我们的结果表明，与将cTAKES模块直接应用于范德堡数据相比，定制模块在所有分类级别（即句子、文档、患者）上都取得了显著更高的F值。

相似文献

A study of transportability of an existing smoking status detection module across institutions.

AMIA Annu Symp Proc. 2012;2012:577-86. Epub 2012 Nov 3.

Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records.

Ups J Med Sci. 2020 Nov;125(4):316-324. doi: 10.1080/03009734.2020.1792010. Epub 2020 Jul 22.

Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach.

BMC Med Inform Decis Mak. 2017 Dec 1;17(1):155. doi: 10.1186/s12911-017-0556-8.

[A customized method for information extraction from unstructured text data in the electronic medical records].

Beijing Da Xue Xue Bao Yi Xue Ban. 2018 Apr 18;50(2):256-263.

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

The Yale cTAKES extensions for document classification: architecture and application.

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):614-20. doi: 10.1136/amiajnl-2011-000093. Epub 2011 May 27.

Ensembles of natural language processing systems for portable phenotyping solutions.

J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.

Part-of-speech tagging for clinical text: wall or bridge between institutions?

AMIA Annu Symp Proc. 2011;2011:382-91. Epub 2011 Oct 22.

Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes.

J Biomed Inform. 2015 Dec;58 Suppl(Suppl):S128-S132. doi: 10.1016/j.jbi.2015.08.002. Epub 2015 Aug 28.

RegEMR: a natural language processing system to automatically identify premature ovarian decline from Chinese electronic medical records.

BMC Med Inform Decis Mak. 2023 Jul 18;23(1):126. doi: 10.1186/s12911-023-02239-8.

引用本文的文献

The validity of electronic health data for measuring smoking status: a systematic review and meta-analysis.

BMC Med Inform Decis Mak. 2024 Feb 2;24(1):33. doi: 10.1186/s12911-024-02416-3.

A cross-institutional evaluation on breast cancer phenotyping NLP algorithms on electronic health records.

Comput Struct Biotechnol J. 2023 Aug 22;22:32-40. doi: 10.1016/j.csbj.2023.08.018. eCollection 2023.

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C).

J Am Med Inform Assoc. 2023 Nov 17;30(12):2036-2040. doi: 10.1093/jamia/ocad134.

Automated Detection of Substance-Use Status and Related Information from Clinical Text.

Sensors (Basel). 2022 Dec 8;22(24):9609. doi: 10.3390/s22249609.

Extracting social determinants of health from electronic health records using natural language processing: a systematic review.

J Am Med Inform Assoc. 2021 Nov 25;28(12):2716-2727. doi: 10.1093/jamia/ocab170.

Defining Phenotypes from Clinical Data to Drive Genomic Research.

Annu Rev Biomed Data Sci. 2018 Jul;1:69-92. doi: 10.1146/annurev-biodatasci-080917-013335. Epub 2018 Apr 25.

A Natural Language Processing Tool to Extract Quantitative Smoking Status from Clinical Narratives.

Proc (IEEE Int Conf Healthc Inform). 2020 Nov-Dec;2020. doi: 10.1109/ICHI48887.2020.9374369. Epub 2021 Mar 12.

Primary Care Artificial Intelligence: A Branch Hiding in Plain Sight.

Ann Fam Med. 2020 May;18(3):194-195. doi: 10.1370/afm.2533.

Developing Customizable Cancer Information Extraction Modules for Pathology Reports Using CLAMP.

Stud Health Technol Inform. 2019 Aug 21;264:1041-1045. doi: 10.3233/SHTI190383.

Assessing data availability and quality within an electronic health record system through external validation against an external clinical data source.

BMC Med Inform Decis Mak. 2019 Jul 25;19(1):143. doi: 10.1186/s12911-019-0864-2.

本文引用的文献

Portability of an algorithm to identify rheumatoid arthritis in electronic health records.

J Am Med Inform Assoc. 2012 Jun;19(e1):e162-9. doi: 10.1136/amiajnl-2011-000583. Epub 2012 Feb 28.

Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study.

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):212-8. doi: 10.1136/amiajnl-2011-000439. Epub 2011 Nov 19.

Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies.

Am J Hum Genet. 2011 Oct 7;89(4):529-42. doi: 10.1016/j.ajhg.2011.09.008.

Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents.

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):559-62. doi: 10.1136/jamia.2010.004028.

Medication information extraction with linguistic pattern matching and semantic rules.

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):532-5. doi: 10.1136/jamia.2010.003657.

High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge.

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):524-7. doi: 10.1136/jamia.2010.003939.

Extracting medication information from clinical text.

J Am Med Inform Assoc. 2010 Sep-Oct;17(5):514-8. doi: 10.1136/jamia.2010.003947.

An overview of MetaMap: historical perspective and recent advances.

J Am Med Inform Assoc. 2010 May-Jun;17(3):229-36. doi: 10.1136/jamia.2009.002733.

Mayo clinic smoking status classification system: extensions and improvements.

AMIA Annu Symp Proc. 2009 Nov 14;2009:619-23.

MedEx: a medication information extraction system for clinical narratives.

J Am Med Inform Assoc. 2010 Jan-Feb;17(1):19-24. doi: 10.1197/jamia.M3378.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一项关于现有吸烟状态检测模块在不同机构间可移植性的研究。

A study of transportability of an existing smoking status detection module across institutions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献