Tariq Amara, Sikha Madhu, Kurian Allison W, Ward Kevin, Keegan Theresa H M, Rubin Daniel L, Banerjee Imon
Department of Radiology, Mayo Clinic, Phoenix, AZ.
Departments of Medicine and of Epidemiology & Population Health, Stanford University School of Medicine, Palo Alto, CA.
JCO Clin Cancer Inform. 2025 Jun;9:e2500002. doi: 10.1200/CCI-25-00002. Epub 2025 Jun 27.
Automated curation of breast cancer treatment data with minimal human involvement could accelerate the collection of statewide and nationwide evidence for patient management and assessing the effectiveness of treatment pathways. The primary challenges are the complexity and inconsistency of structured clinical data streams and accurate extraction of this information from free-text clinical narratives.
We proposed a hybrid two-phase information extraction framework that combined a Unified Medical Language System parser (phase-1) with a fine-tuned large language model (LLM; phase-2) to extract longitudinal treatment timelines from time-stamped clinical notes. Our framework was developed through end-to-end joint learning as a question-answering model, where the model was trained to simultaneously answer five questions, each corresponding to a specific treatment.
We fine-tuned and internally validated the model on 26,692 patients with breast cancer (diagnosed between 2013 and 2020) receiving treatment at Mayo Clinic and externally validated the model on 162 randomly selected patients from Stanford Healthcare. Zero-shot LLM (out-of-the-box) had high specificity but low sensitivity, indicating that although these frameworks are useful for generic language understanding, they are lacking in terms of targeted clinical tasks. The proposed model achieved 0.942 average AUROC on the internal and 0.924 on the external data, demonstrating only marginal drop in performance when evaluated on external. The proposed model also achieved better trade-off between sensitivity (average: 79.2%) and specificity (average: 76.2%) compared with rule-based (average sensitivity: 70.5%, average specificity: 68.1%) and structured codes (average sensitivity: 64.1%, average specificity: 83.5%).
The proposed framework can extract temporal information about cancer treatments from various time-stamped clinic notes, regardless of the setting of treatment administration (inpatient or outpatient) or time frame. To support the cancer research community for such data curation and longitudinal analysis, we have packaged the code as a docker image, which needs minimal system reconfiguration and shared with an open-source academic license.
以最少的人工干预自动整理乳腺癌治疗数据,可加快收集全州和全国范围内用于患者管理及评估治疗路径有效性的证据。主要挑战在于结构化临床数据流的复杂性和不一致性,以及从自由文本临床叙述中准确提取此类信息。
我们提出了一种混合两阶段信息提取框架,该框架将统一医学语言系统解析器(第一阶段)与微调后的大语言模型(第二阶段)相结合,以从带时间戳的临床记录中提取纵向治疗时间线。我们的框架是通过端到端联合学习开发的一个问答模型,在该模型中,模型被训练同时回答五个问题,每个问题对应一种特定治疗。
我们在梅奥诊所接受治疗的26692例乳腺癌患者(2013年至2020年期间确诊)上对模型进行了微调及内部验证,并在斯坦福医疗随机选取的162例患者上对模型进行了外部验证。零样本大语言模型(开箱即用)具有高特异性但低敏感性,这表明尽管这些框架对通用语言理解有用,但在针对性临床任务方面存在不足。所提出的模型在内部数据上的平均曲线下面积(AUROC)为0.942,在外部数据上为0.924,表明在外部评估时性能仅略有下降。与基于规则的方法(平均敏感性:70.5%,平均特异性:68.1%)和结构化编码(平均敏感性:64.1%,平均特异性:83.5%)相比,所提出的模型在敏感性(平均:79.2%)和特异性(平均:76.2%)之间也实现了更好的权衡。
所提出的框架能够从各种带时间戳的临床记录中提取有关癌症治疗的时间信息,无论治疗管理的环境(住院或门诊)或时间范围如何。为支持癌症研究界进行此类数据整理和纵向分析,我们已将代码打包为一个Docker镜像,其需要最少的系统重新配置,并以开源学术许可共享。