Lian Wenyi, Lindblad Joakim, Runow Stark Christina, Hirsch Jan-Michaél, Sladoje Nataša
Centre for Image Analysis, Department of Information Technology, Uppsala University, Uppsala, Sweden.
Department of Surgical Sciences, Uppsala University, Uppsala, Sweden; Folktandvården, Region Uppsala, Uppsala, Sweden.
Comput Biol Med. 2025 Feb;185:109498. doi: 10.1016/j.compbiomed.2024.109498. Epub 2024 Dec 10.
Oral cancer is a global health challenge. The disease can be successfully treated if detected early, but the survival rate drops significantly for late stage cases. There is a growing interest in a shift from the current standard of invasive and time-consuming tissue sampling and histological examination, towards non-invasive brush biopsies and cytological examination, facilitating continued risk group monitoring. For cost effective and accurate cytological analysis there is a great need for reliable computer-assisted data-driven approaches. However, infeasibility of accurate cell-level annotation hinders model performance, and limits evaluation and interpretation of the results. This study aims to improve AI-based oral cancer detection by introducing additional information through multimodal imaging and deep multimodal information fusion.
We combine brightfield and fluorescence whole slide microscopy imaging to analyze Papanicolaou-stained liquid-based cytology slides of brush biopsies collected from both healthy and cancer patients. Given the challenge of detailed cytological annotations, we utilize a weakly supervised deep learning approach only relying on patient-level labels. We evaluate various multimodal information fusion strategies, including early, late, and three recent intermediate fusion methods.
Our experiments demonstrate that: (i) there is substantial diagnostic information to gain from fluorescence imaging of Papanicolaou-stained cytological samples, (ii) multimodal information fusion improves classification performance and cancer detection accuracy, compared to single-modality approaches. Intermediate fusion emerges as the leading method among the studied approaches. Specifically, the Co-Attention Fusion Network (CAFNet) model achieves impressive results, with an F1 score of 83.34% and an accuracy of 91.79% at cell level, surpassing human performance on the task. Additional tests highlight the importance of accurate image registration to maximize the benefits of the multimodal analysis.
This study advances the field of cytopathology by integrating deep learning methods, multimodal imaging and information fusion to enhance non-invasive early detection of oral cancer. Our approach not only improves diagnostic accuracy, but also allows an efficient, yet uncomplicated, clinical workflow. The developed pipeline has potential applications in other cytological analysis settings. We provide a validated open-source analysis framework and share a unique multimodal oral cancer dataset to support further research and innovation.
口腔癌是一项全球性的健康挑战。如果能早期发现,这种疾病是可以成功治疗的,但晚期病例的存活率会显著下降。人们越来越关注从当前侵入性且耗时的组织采样和组织学检查标准,转向非侵入性刷检活检和细胞学检查,以便持续监测风险群体。为了进行具有成本效益且准确的细胞学分析,非常需要可靠的计算机辅助数据驱动方法。然而,准确的细胞水平注释不可行阻碍了模型性能,并限制了结果的评估和解释。本研究旨在通过多模态成像和深度多模态信息融合引入额外信息,以改进基于人工智能的口腔癌检测。
我们结合明场和荧光全玻片显微镜成像,来分析从健康和癌症患者收集的巴氏染色液基细胞学刷检玻片。鉴于详细细胞学注释的挑战,我们采用仅依赖患者水平标签的弱监督深度学习方法。我们评估了各种多模态信息融合策略,包括早期、晚期和三种最新的中间融合方法。
我们的实验表明:(i)从巴氏染色细胞学样本的荧光成像中可获得大量诊断信息;(ii)与单模态方法相比,多模态信息融合提高了分类性能和癌症检测准确率。在研究的方法中,中间融合成为领先方法。具体而言,协同注意力融合网络(CAFNet)模型取得了令人印象深刻的结果,在细胞水平上F1分数为83.34%,准确率为91.79%,超过了人类在该任务上的表现。额外测试突出了准确图像配准对于最大化多模态分析益处的重要性。
本研究通过整合深度学习方法、多模态成像和信息融合,推进了细胞病理学领域,以加强口腔癌的非侵入性早期检测。我们的方法不仅提高了诊断准确性,还允许高效且简单的临床工作流程。所开发的流程在其他细胞学分析环境中具有潜在应用。我们提供了经过验证的开源分析框架,并分享了一个独特的多模态口腔癌数据集,以支持进一步的研究和创新。