Suppr超能文献

实用化现有的吸烟检测管道和减少支持向量机训练语料库需求。

Practical implementation of an existing smoking detection pipeline and reduced support vector machine training corpus requirements.

机构信息

Department of Radiation Oncology, Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia.

出版信息

J Am Med Inform Assoc. 2014 Jan-Feb;21(1):27-30. doi: 10.1136/amiajnl-2013-002090. Epub 2013 Aug 6.

Abstract

This study aimed to reduce reliance on large training datasets in support vector machine (SVM)-based clinical text analysis by categorizing keyword features. An enhanced Mayo smoking status detection pipeline was deployed. We used a corpus of 709 annotated patient narratives. The pipeline was optimized for local data entry practice and lexicon. SVM classifier retraining used a grouped keyword approach for better efficiency. Accuracy, precision, and F-measure of the unaltered and optimized pipelines were evaluated using k-fold cross-validation. Initial accuracy of the clinical Text Analysis and Knowledge Extraction System (cTAKES) package was 0.69. Localization and keyword grouping improved system accuracy to 0.9 and 0.92, respectively. F-measures for current and past smoker classes improved from 0.43 to 0.81 and 0.71 to 0.91, respectively. Non-smoker and unknown-class F-measures were 0.96 and 0.98, respectively. Keyword grouping had no negative effect on performance, and decreased training time. Grouping keywords is a practical method to reduce training corpus size.

摘要

本研究旨在通过对关键词特征进行分类,减少基于支持向量机 (SVM) 的临床文本分析对大型训练数据集的依赖。部署了一个增强的 Mayo 吸烟状况检测管道。我们使用了一个包含 709 个注释患者叙述的语料库。该管道针对本地数据输入实践和词汇进行了优化。SVM 分类器重新训练使用分组关键词方法以提高效率。使用 k 折交叉验证评估了未修改和优化管道的准确性、精度和 F 度量。临床文本分析和知识提取系统 (cTAKES) 包的初始准确性为 0.69。本地化和关键词分组将系统准确性分别提高到 0.9 和 0.92。当前和过去吸烟者类别的 F 度量分别从 0.43 提高到 0.81 和从 0.71 提高到 0.91。非吸烟者和未知类别 F 度量分别为 0.96 和 0.98。关键词分组对性能没有负面影响,并且减少了训练时间。关键词分组是减少训练语料库大小的实用方法。

相似文献

4
Learning regular expressions for clinical text classification.学习正则表达式进行临床文本分类。
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):850-7. doi: 10.1136/amiajnl-2013-002411. Epub 2014 Feb 27.
8
Enhancing clinical concept extraction with distributional semantics.利用分布语义增强临床概念提取。
J Biomed Inform. 2012 Feb;45(1):129-40. doi: 10.1016/j.jbi.2011.10.007. Epub 2011 Nov 7.

引用本文的文献

3
Clinical concept extraction: A methodology review.临床概念提取:方法学综述。
J Biomed Inform. 2020 Sep;109:103526. doi: 10.1016/j.jbi.2020.103526. Epub 2020 Aug 6.
5
Clinical information extraction applications: A literature review.临床信息提取应用:文献综述。
J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.
9
Learning regular expressions for clinical text classification.学习正则表达式进行临床文本分类。
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):850-7. doi: 10.1136/amiajnl-2013-002411. Epub 2014 Feb 27.

本文引用的文献

6
Identifying patient smoking status from medical discharge records.从医疗出院记录中识别患者的吸烟状况。
J Am Med Inform Assoc. 2008 Jan-Feb;15(1):14-24. doi: 10.1197/jamia.M2408. Epub 2007 Oct 18.
7
9
Tobacco smoking and cancer: a meta-analysis.吸烟与癌症:一项荟萃分析。
Int J Cancer. 2008 Jan 1;122(1):155-64. doi: 10.1002/ijc.23033.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验