Suppr超能文献

通过头部CT报告的自然语言处理识别颅内占位效应:CTIME算法

Natural language processing of head CT reports to identify intracranial mass effect: CTIME algorithm.

作者信息

Gordon Alexandra June, Banerjee Imon, Block Jason, Winstead-Derlega Christopher, Wilson Jennifer G, Mitarai Tsuyoshi, Jarrett Michael, Sanyal Josh, Rubin Daniel L, Wintermark Max, Kohn Michael A

机构信息

Stanford University, Department of Emergency Medicine, Critical Care, Stanford, CA, United States of America.

Emory University, Department of Biomedical Informatics, Department of Radiology Georgia Tech, Department of Biomedical Engineering, Atlanta, GA, United States of America.

出版信息

Am J Emerg Med. 2022 Jan;51:388-392. doi: 10.1016/j.ajem.2021.11.001. Epub 2021 Nov 9.

Abstract

BACKGROUND

The Mortality Probability Model (MPM) is used in research and quality improvement to adjust for severity of illness and can also inform triage decisions. However, a limitation for its automated use or application is that it includes the variable "intracranial mass effect" (IME), which requires human engagement with the electronic health record (EHR). We developed and tested a natural language processing (NLP) algorithm to identify IME from CT head reports.

METHODS

We obtained initial CT head reports from adult patients who were admitted to the ICU from our ED between 10/2013 and 9/2016. Each head CT head report was labeled yes/no IME by at least two of five independent labelers. The reports were then randomly divided 80/20 into training and test sets. All reports were preprocessed to remove linguistic and style variability, and a dictionary was created to map similar common terms. We tested three vectorization strategies: Term Frequency-Inverse Document frequency (TF-IDF), Word2Vec, and Universal Sentence Encoder to convert the report text to a numerical vector. This vector served as the input to a classification-tree-based ensemble machine learning algorithm (XGBoost). After training, model performance was assessed in the test set using the area under the receiver operating characteristic curve (AUROC). We also divided the continuous range of scores into positive/inconclusive/negative categories for IME.

RESULTS

Of the 1202 CT reports in the training set, 308 (25.6%) reports were manually labeled as "yes" for IME. Of the 355 reports in the test set, 108 (30.4%) were labeled as "yes" for IME. The TF-IDF vectorization strategy as an input for the XGBoost model had the best AUROC:-- 0.9625 (95% CI 0.9443-0.9807). TF-IDF score categories were defined and had the following likelihood ratios: "positive" (TF-IDF score > 0.5) LR = 24.59; "inconclusive" (TF-IDF 0.05-0.5) LR = 0.99; and "negative" (TF-IDF < 0.05) LR = 0.05. 82% of reports were classified as either "positive" or "negative". In the test set, only 4 of 199 (2.0%) reports with a "negative" classification were false negatives and only 8 of 93 (8.6%) reports classified as "positive" were false positives.

CONCLUSION

NLP can accurately identify IME from free-text reports of head CTs in approximately 80% of records, adequate to allow automatic calculation of MPM based on EHR data for many applications.

摘要

背景

死亡率概率模型(MPM)用于研究和质量改进,以调整疾病严重程度,还可为分诊决策提供依据。然而,其自动使用或应用的一个局限性在于它包含变量“颅内占位效应”(IME),这需要人工查阅电子健康记录(EHR)。我们开发并测试了一种自然语言处理(NLP)算法,用于从头部CT报告中识别IME。

方法

我们获取了2013年10月至2016年9月期间从急诊科收入重症监护病房(ICU)的成年患者的初始头部CT报告。每份头部CT报告由五名独立标注人员中的至少两人标注为存在/不存在IME。然后将报告以80/20的比例随机分为训练集和测试集。对所有报告进行预处理以消除语言和风格差异,并创建一个词典来映射相似的常用术语。我们测试了三种向量化策略:词频 - 逆文档频率(TF - IDF)、词向量(Word2Vec)和通用句子编码器,以将报告文本转换为数值向量。该向量作为基于分类树的集成机器学习算法(XGBoost)的输入。训练后,使用受试者操作特征曲线下面积(AUROC)在测试集中评估模型性能。我们还将连续的分数范围分为IME的阳性/不确定/阴性类别。

结果

在训练集中的1202份CT报告中,308份(25.6%)报告被人工标注为IME“存在”。在测试集中的355份报告中,108份(30.4%)被标注为IME“存在”。以TF - IDF向量化策略作为XGBoost模型的输入具有最佳的AUROC:0.9625(95%置信区间0.9443 - 0.9807)。定义了TF - IDF分数类别,其似然比如下:“阳性”(TF - IDF分数>0.5)LR = 24.59;“不确定”(TF - IDF 0.05 - 0.5)LR = 0.99;“阴性”(TF - IDF<0.05)LR = 0.05。82%的报告被分类为“阳性”或“阴性”。在测试集中,199份“阴性”分类报告中只有4份(2.0%)为假阴性,93份“阳性”分类报告中只有8份(8.6%)为假阳性。

结论

NLP可以从头部CT的自由文本报告中准确识别IME,在大约80%的记录中适用,足以允许基于EHR数据自动计算MPM以用于许多应用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验