使用词嵌入和机器学习技术从临床记录中自动识别免疫相关不良事件患者

Automated Identification of Patients With Immune-Related Adverse Events From Clinical Notes Using Word Embedding and Machine Learning.

机构信息

Innovation Center for Biomedical Informatics (ICBI), Georgetown University, Washington, DC.

Memorial Sloan Kettering Cancer Center, Manhattan, New York, NY.

出版信息

JCO Clin Cancer Inform. 2021 May;5:541-549. doi: 10.1200/CCI.20.00109.

DOI:10.1200/CCI.20.00109

PMID:33989017

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8462565/

Abstract

PURPOSE

Although immune checkpoint inhibitors (ICIs) have substantially improved survival in patients with advanced malignancies, they are associated with a unique spectrum of side effects termed immune-related adverse events (irAEs). To ensure treatment safety, research efforts are needed to comprehensively detect and understand irAEs. Retrospective analysis of data from electronic health records can provide knowledge to characterize these toxicities. However, such information is not captured in a structured format within the electronic health record and requires manual chart review.

MATERIALS AND METHODS

In this work, we propose a natural language processing pipeline that can automatically annotate clinical notes and determine whether there is evidence that a patient developed an irAE. Seven hundred eighty-one cases were manually reviewed by clinicians and annotated for irAEs at the patient level. A dictionary of irAEs keywords was used to perform text reduction on clinical notes belonging to each patient; only sentences with relevant expressions were kept. Word embeddings were then used to generate vector representations over the reduced text, which served as input for the machine learning classifiers. The output of the models was presence or absence of any irAEs. Additional models were built to classify skin-related toxicities, endocrine toxicities, and colitis.

RESULTS

The model for any irAE achieved an average F1-score = 0.75 and area under the receiver operating characteristic curve = 0.85. This outperformed a basic keyword filtering approach. Although the classifier of any irAEs achieved good accuracy, individual irAE classification still has room for improvement.

CONCLUSION

We demonstrate that patient-level annotations combined with a machine learning approach using keywords filtering and word embeddings can achieve promising accuracy in classifying irAEs in clinical notes. This model may facilitate annotation and analysis of large irAEs data sets.

摘要

目的

尽管免疫检查点抑制剂（ICIs）显著改善了晚期恶性肿瘤患者的生存，但它们与一种称为免疫相关不良事件（irAEs）的独特副作用谱有关。为了确保治疗安全，需要进行研究工作以全面检测和了解 irAEs。电子健康记录中数据的回顾性分析可以提供知识来描述这些毒性。然而，此类信息在电子健康记录中未以结构化格式捕获，需要进行手动图表审查。

材料和方法

在这项工作中，我们提出了一种自然语言处理管道，可以自动注释临床记录并确定患者是否有发生 irAE 的证据。781 例病例由临床医生手动审查并对患者进行 irAE 注释。使用 irAE 关键字词典对每位患者的临床记录进行文本缩减；仅保留具有相关表达的句子。然后使用词嵌入生成简化文本的向量表示，作为机器学习分类器的输入。模型的输出为是否存在任何 irAE。还构建了其他模型来分类皮肤毒性、内分泌毒性和结肠炎。

结果

任何 irAE 的模型平均 F1 得分为 0.75，接收器操作特征曲线下面积为 0.85。这优于基本的关键字过滤方法。虽然任何 irAE 的分类器都具有良好的准确性，但个别 irAE 分类仍有改进的空间。

结论

我们证明了患者级注释与使用关键字过滤和词嵌入的机器学习方法相结合，可以在对临床记录中的 irAE 进行分类时达到有希望的准确性。该模型可以促进 irAEs 大数据集的注释和分析。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用词嵌入和机器学习技术从临床记录中自动识别免疫相关不良事件患者

Automated Identification of Patients With Immune-Related Adverse Events From Clinical Notes Using Word Embedding and Machine Learning.

机构信息

出版信息

PURPOSE

MATERIALS AND METHODS

RESULTS

CONCLUSION

目的

材料和方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

使用词嵌入和机器学习技术从临床记录中自动识别免疫相关不良事件患者

Automated Identification of Patients With Immune-Related Adverse Events From Clinical Notes Using Word Embedding and Machine Learning.

机构信息

出版信息

PURPOSE

MATERIALS AND METHODS

RESULTS

CONCLUSION

目的

材料和方法

结果

结论

相似文献

引用本文的文献

本文引用的文献