Suppr
超能文献

从多类型日本临床文本中自动捕获患者症状的系统：回顾性研究。

Automated System to Capture Patient Symptoms From Multitype Japanese Clinical Texts: Retrospective Study.

机构信息

Department of Information Science, Nara Institute of Science and Technology, Ikoma, Japan.

Graduate School of Medicine, Kyoto University, Kyoto, Japan.

出版信息

JMIR Med Inform. 2024 Sep 24;12:e58977. doi: 10.2196/58977.

DOI:10.2196/58977

PMID:39316418

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11462096/

Abstract

BACKGROUND

Natural language processing (NLP) techniques can be used to analyze large amounts of electronic health record texts, which encompasses various types of patient information such as quality of life, effectiveness of treatments, and adverse drug event (ADE) signals. As different aspects of a patient's status are stored in different types of documents, we propose an NLP system capable of processing 6 types of documents: physician progress notes, discharge summaries, radiology reports, radioisotope reports, nursing records, and pharmacist progress notes.

OBJECTIVE

This study aimed to investigate the system's performance in detecting ADEs by evaluating the results from multitype texts. The main objective is to detect adverse events accurately using an NLP system.

METHODS

We used data written in Japanese from 2289 patients with breast cancer, including medication data, physician progress notes, discharge summaries, radiology reports, radioisotope reports, nursing records, and pharmacist progress notes. Our system performs 3 processes: named entity recognition, normalization of symptoms, and aggregation of multiple types of documents from multiple patients. Among all patients with breast cancer, 103 and 112 with peripheral neuropathy (PN) received paclitaxel or docetaxel, respectively. We evaluate the utility of using multiple types of documents by correlation coefficient and regression analysis to compare their performance with each single type of document. All evaluations of detection rates with our system are performed 30 days after drug administration.

RESULTS

Our system underestimates by 13.3 percentage points (74.0%-60.7%), as the incidence of paclitaxel-induced PN was 60.7%, compared with 74.0% in the previous research based on manual extraction. The Pearson correlation coefficient between the manual extraction and system results was 0.87 Although the pharmacist progress notes had the highest detection rate among each type of document, the rate did not match the performance using all documents. The estimated median duration of PN with paclitaxel was 92 days, whereas the previously reported median duration of PN with paclitaxel was 727 days. The number of events detected in each document was highest in the physician's progress notes, followed by the pharmacist's and nursing records.

CONCLUSIONS

Considering the inherent cost that requires constant monitoring of the patient's condition, such as the treatment of PN, our system has a significant advantage in that it can immediately estimate the treatment duration without fine-tuning a new NLP model. Leveraging multitype documents is better than using single-type documents to improve detection performance. Although the onset time estimation was relatively accurate, the duration might have been influenced by the length of the data follow-up period. The results suggest that our method using various types of data can detect more ADEs from clinical documents.

摘要

背景

自然语言处理（NLP）技术可用于分析大量的电子健康记录文本，其中包含各种类型的患者信息，如生活质量、治疗效果和药物不良事件（ADE）信号。由于患者状况的不同方面存储在不同类型的文档中，我们提出了一种能够处理 6 种类型文档的 NLP 系统：医生进度记录、出院小结、放射学报告、放射性同位素报告、护理记录和药剂师进度记录。

目的

本研究旨在通过评估多类型文本的结果，研究系统检测 ADE 的性能。主要目的是使用 NLP 系统准确检测不良事件。

方法

我们使用了 2289 名乳腺癌患者的日文数据，包括用药数据、医生进度记录、出院小结、放射学报告、放射性同位素报告、护理记录和药剂师进度记录。我们的系统执行 3 个过程：命名实体识别、症状归一化和来自多个患者的多种类型文档的聚合。在所有乳腺癌患者中，103 名接受紫杉醇治疗的患者和 112 名接受多西紫杉醇治疗的患者分别患有周围神经病变（PN）。我们通过相关系数和回归分析来比较使用多种类型文档的效用，以比较它们与每种单一类型文档的性能。我们使用系统进行的所有检测率评估均在给药后 30 天进行。

结果

我们的系统低估了 13.3 个百分点（60.7%-47.4%），因为紫杉醇诱导的 PN 发生率为 60.7%，而之前基于手动提取的研究结果为 74.0%。手动提取和系统结果之间的皮尔逊相关系数为 0.87。尽管药剂师进度记录在每种类型的文档中都具有最高的检测率，但该率与使用所有文档的性能不匹配。紫杉醇治疗 PN 的估计中位持续时间为 92 天，而之前报道的紫杉醇治疗 PN 的中位持续时间为 727 天。在每个文档中检测到的事件数量在医生的进度记录中最高，其次是药剂师的和护理记录。

结论

考虑到需要持续监测患者病情的固有成本，例如 PN 的治疗，我们的系统具有显著优势，因为它可以在无需微调新的 NLP 模型的情况下立即估计治疗持续时间。利用多种类型的文档比使用单一类型的文档来提高检测性能更好。尽管发病时间估计相对准确，但持续时间可能受到数据随访时间的影响。结果表明，我们使用各种类型数据的方法可以从临床文档中检测到更多的 ADE。