Tsuchiya Masami, Kawazoe Yoshimasa, Shimamoto Kiminori, Seki Tomohisa, Imai Shungo, Kizaki Hayato, Shinohara Emiko, Yada Shuntaro, Wakamiya Shoko, Aramaki Eiji, Hori Satoko
Division of Drug Informatics, Keio University Faculty of Pharmacy, Minato-ku, Japan.
Artificial Intelligence and Digital Twin in Healthcare, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Japan.
JCO Clin Cancer Inform. 2025 Aug;9:e2500096. doi: 10.1200/CCI-25-00096. Epub 2025 Aug 12.
Capecitabine, an oral anticancer agent, frequently causes hand-foot syndrome (HFS), affecting patients' quality of life and treatment adherence. However, such symptomatic toxicities are often difficult to detect in structured electronic health record (EHR) data. This study primarily aimed to validate a natural language processing (NLP) approach to identifying capecitabine-induced HFS from unstructured clinical text and demonstrate its application in evaluating medication-associated adverse event trends in real-world settings.
We conducted a retrospective cohort study using EHRs from the University of Tokyo Hospital (2004-2021). HFS cases were identified using the MedNERN-CR-JA NLP model. After propensity score matching, we compared capecitabine users with and without celecoxib and assessed time to HFS onset using Cox proportional hazards models. NLP-based HFS detection was validated through manual annotation of aggregated clinical notes. Negative control and sensitivity analyses ensured robustness.
Among 44,502 patients with cancer, 669 capecitabine users were analyzed. HFS incidence was significantly higher among capecitabine users (hazard ratio [HR], 1.93 [95% CI, 1.48 to 2.52]; < .001) compared with nonusers. Celecoxib use showed a suggestive association with a reduced HFS risk (HR, 0.51 [95% CI, 0.24 to 1.07]; = .073). The NLP model demonstrated high accuracy in identifying HFS, achieving a precision of 0.875, recall of 1.000, and F score of 0.933, based on manual annotation of patient-level clinical notes. Outcome trends remained consistent when using manually annotated HFS case labels instead of NLP-detected events, supporting the method's robustness.
These findings demonstrate the effectiveness of NLP in detecting HFS from real-world clinical records. The application to celecoxib-HFS detection illustrates the potential utility of this approach for retrospective safety analysis. Further work is needed to evaluate generalizability across diverse clinical settings.
卡培他滨是一种口服抗癌药,常引起手足综合征(HFS),影响患者的生活质量和治疗依从性。然而,此类症状性毒性在结构化电子健康记录(EHR)数据中往往难以检测到。本研究主要旨在验证一种自然语言处理(NLP)方法,用于从非结构化临床文本中识别卡培他滨引起的HFS,并展示其在评估真实世界中药物相关不良事件趋势方面的应用。
我们使用东京大学医院(2004 - 2021年)的电子健康记录进行了一项回顾性队列研究。使用MedNERN - CR - JA自然语言处理模型识别HFS病例。在倾向得分匹配后,我们比较了使用和未使用塞来昔布的卡培他滨使用者,并使用Cox比例风险模型评估HFS发病时间。通过对汇总临床记录的人工标注验证基于自然语言处理的HFS检测。阴性对照和敏感性分析确保了稳健性。
在44,502例癌症患者中,分析了669例卡培他滨使用者。与未使用者相比,卡培他滨使用者的HFS发生率显著更高(风险比[HR],1.93[95%置信区间,1.48至2.52];P <.001)。使用塞来昔布显示出与降低HFS风险的提示性关联(HR,0.51[95%置信区间,0.24至1.07];P =.073)。基于对患者层面临床记录的人工标注,自然语言处理模型在识别HFS方面表现出高精度,精确率为0.875,召回率为1.000,F值为0.933。当使用人工标注的HFS病例标签而非自然语言处理检测到的事件时,结果趋势保持一致,支持了该方法的稳健性。
这些发现证明了自然语言处理在从真实世界临床记录中检测HFS方面的有效性。应用于塞来昔布 - HFS检测说明了该方法在回顾性安全性分析中的潜在效用。需要进一步开展工作以评估其在不同临床环境中的通用性。