Suppr超能文献

比较 2 种自然语言处理方法在识别危重症患者出血中的应用。

Comparison of 2 Natural Language Processing Methods for Identification of Bleeding Among Critically Ill Patients.

机构信息

Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City.

Division of Cardiovascular Medicine, University of Utah School of Medicine, Salt Lake City.

出版信息

JAMA Netw Open. 2018 Oct 5;1(6):e183451. doi: 10.1001/jamanetworkopen.2018.3451.

Abstract

IMPORTANCE

To improve patient safety, health care systems need reliable methods to detect adverse events in large patient populations. Events are often described in clinical notes, rather than structured data, which make them difficult to identify on a large scale.

OBJECTIVE

To develop and compare 2 natural language processing methods, a rules-based approach and a machine learning (ML) approach, for identifying bleeding events in clinical notes.

DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study used deidentified notes from the Medical Information Mart for Intensive Care, which spans 2001 to 2012. A training set of 990 notes and a test set of 660 notes were randomly selected. Physicians classified each note as present or absent for a clinically relevant bleeding event during the hospitalization. A bleeding dictionary was developed for the rules-based approach; bleeding mentions were then aggregated to arrive at a classification for each note. Three ML models (support vector machine, extra trees, and convolutional neural network) were developed and trained using the 990-note training set. Another instance of each ML model was also trained on a sample of 450 notes, with equal numbers of bleeding-present and bleeding-absent notes. The notes were represented using term frequency-inverse document frequency vectors and global vectors for word representation.

MAIN OUTCOMES AND MEASURES

The main outcomes were accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for each model. Following training, the models were tested on the test set and sensitivities were compared using a McNemar test.

RESULTS

The 990-note training set represented 769 patients (296 [38.5%] female; mean [SD] age, 67.42 [14.7] years). The 660-note test set represented 527 patients (211 [40.0%] female; mean [SD] age, 67.86 [14.7] years). Bleeding was present in 146 notes (22.1%). The extra trees down-sampled model and rules-based approaches were similarly sensitive (93.8% vs 91.1%; difference, 2.7%; 95% CI, -3.8% to 7.9%; P = .44). The positive predictive value for the extra trees model, however, was 48.6%. The rules-based model had the best performance overall, with 84.6% specificity, 62.7% positive predictive value, and 97.1% negative predictive value.

CONCLUSIONS AND RELEVANCE

Bleeding is a common complication in health care, and these results demonstrate an automated and scalable detection method. The rules-based natural language processing approach, compared with ML, had the best performance in identifying bleeding, with high sensitivity and negative predictive value.

摘要

重要性

为了提高患者安全,医疗保健系统需要可靠的方法来检测大量患者人群中的不良事件。事件通常在临床记录中描述,而不是在结构化数据中,这使得它们难以大规模识别。

目的

开发并比较两种自然语言处理方法,基于规则的方法和机器学习 (ML) 方法,用于识别临床记录中的出血事件。

设计、设置和参与者:这项诊断研究使用了跨越 2001 年至 2012 年的医疗信息集市重症监护中的去识别记录。随机选择了 990 份记录的训练集和 660 份记录的测试集。医生根据住院期间是否存在临床相关出血事件对每一份记录进行分类。基于规则的方法开发了一个出血词典;然后汇总出血提及,得出每份记录的分类。使用支持向量机、额外树和卷积神经网络开发并训练了三个 ML 模型,使用 990 份记录的训练集。还使用 450 份记录的样本训练了每个 ML 模型的另一个实例,其中包含相同数量的出血存在和出血不存在的记录。记录使用术语频率逆文档频率向量和全局向量表示单词。

主要结果和措施

主要结果是每个模型的准确性、敏感性、特异性、阳性预测值和阴性预测值。训练后,在测试集上测试模型,并使用麦克内马尔检验比较敏感性。

结果

990 份记录的训练集代表了 769 名患者(296 名[38.5%]女性;平均[SD]年龄,67.42 [14.7]岁)。660 份记录的测试集代表了 527 名患者(211 名[40.0%]女性;平均[SD]年龄,67.86 [14.7]岁)。出血存在于 146 份记录中(22.1%)。额外树下采样模型和基于规则的方法的敏感性相似(93.8%对 91.1%;差异,2.7%;95%CI,-3.8%至 7.9%;P = .44)。然而,额外树模型的阳性预测值为 48.6%。基于规则的模型总体表现最佳,特异性为 84.6%,阳性预测值为 62.7%,阴性预测值为 97.1%。

结论和相关性

出血是医疗保健中的常见并发症,这些结果证明了一种自动化和可扩展的检测方法。与 ML 相比,基于规则的自然语言处理方法在识别出血方面表现最佳,具有较高的敏感性和阴性预测值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e7ed/6324448/b4cefd843e09/jamanetwopen-1-e183451-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验