Barlow Stephen H, Chicklore Sugama, He Yulan, Ourselin Sebastien, Wagner Thomas, Barnes Anna, Cook Gary J R
School of Biomedical Engineering and Imaging Sciences, King's College London, London, UK.
King's College London and Guy's and St. Thomas' PET Centre, St. Thomas' Hospital, London, UK.
BMC Med Inform Decis Mak. 2024 Dec 18;24(1):396. doi: 10.1186/s12911-024-02814-7.
[F] Fluorodeoxyglucose (FDG) PET-CT is a clinical imaging modality widely used in diagnosing and staging lung cancer. The clinical findings of PET-CT studies are contained within free text reports, which can currently only be categorised by experts manually reading them. Pre-trained transformer-based language models (PLMs) have shown success in extracting complex linguistic features from text. Accordingly, we developed a multi-task 'TNMu' classifier to classify the presence/absence of tumour, node, metastasis ('TNM') findings (as defined by The Eight Edition of TNM Staging for Lung Cancer). This is combined with an uncertainty classification task ('u') to account for studies with ambiguous TNM status.
2498 reports were annotated by a nuclear medicine physician and split into train, validation, and test datasets. For additional evaluation an external dataset (n = 461 reports) was created, and annotated by two nuclear medicine physicians with agreement reached on all examples. We trained and evaluated eleven publicly available PLMs to determine which is most effective for PET-CT reports, and compared multi-task, single task and traditional machine learning approaches.
We find that a multi-task approach with GatorTron as PLM achieves the best performance, with an overall accuracy (all four tasks correct) of 84% and a Hamming loss of 0.05 on the internal test dataset, and 79% and 0.07 on the external test dataset. Performance on the individual TNM tasks approached expert performance with macro average F1 scores of 0.91, 0.95 and 0.90 respectively on external data. For uncertainty an F1 of 0.77 is achieved.
Our 'TNMu' classifier successfully extracts TNM staging information from internal and external PET-CT reports. We concluded that multi-task approaches result in the best performance, and better computational efficiency over single task PLM approaches. We believe these models can improve PET-CT services by assisting in auditing, creating research cohorts, and developing decision support systems. Our approach to handling uncertainty represents a novel first step but has room for further refinement.
[F]氟脱氧葡萄糖(FDG)PET-CT是一种广泛应用于肺癌诊断和分期的临床成像方式。PET-CT研究的临床结果包含在自由文本报告中,目前只能由专家通过人工阅读进行分类。基于预训练变压器的语言模型(PLM)已成功从文本中提取复杂的语言特征。因此,我们开发了一种多任务“TNMu”分类器,用于对肿瘤、淋巴结、转移(“TNM”)结果(如《肺癌TNM分期第八版》所定义)的存在与否进行分类。这与不确定性分类任务(“u”)相结合,以处理TNM状态不明确的研究。
由一名核医学医师对2498份报告进行注释,并将其分为训练集、验证集和测试集。为了进行额外评估,创建了一个外部数据集(n = 461份报告),并由两名核医学医师进行注释,所有示例均达成一致。我们训练并评估了11个公开可用的PLM,以确定哪个对PET-CT报告最有效,并比较了多任务、单任务和传统机器学习方法。
我们发现,以GatorTron作为PLM的多任务方法性能最佳,在内部测试数据集上的总体准确率(所有四项任务均正确)为84%,汉明损失为0.05,在外部测试数据集上为79%和0.07。在各个TNM任务上的性能接近专家水平,在外部数据上的宏观平均F1分数分别为0.91、0.95和0.90。对于不确定性,F1分数为0.77。
我们的“TNMu”分类器成功地从内部和外部PET-CT报告中提取了TNM分期信息。我们得出结论,多任务方法性能最佳,且比单任务PLM方法具有更高的计算效率。我们相信这些模型可以通过协助审核、创建研究队列和开发决策支持系统来改善PET-CT服务。我们处理不确定性的方法代表了新颖的第一步,但仍有进一步完善的空间。