School of Data Science, University of Science and Technology of China, Hefei, PR China.
Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei, PR China.
J Pathol Clin Res. 2023 May;9(3):223-235. doi: 10.1002/cjp2.312. Epub 2023 Feb 1.
Many artificial intelligence models have been developed to predict clinically relevant biomarkers for colorectal cancer (CRC), including microsatellite instability (MSI). However, existing deep learning networks require large training datasets, which are often hard to obtain. In this study, based on the latest Hierarchical Vision Transformer using Shifted Windows (Swin Transformer [Swin-T]), we developed an efficient workflow to predict biomarkers in CRC (MSI, hypermutation, chromosomal instability, CpG island methylator phenotype, and BRAF and TP53 mutation) that required relatively small datasets. Our Swin-T workflow substantially achieved the state-of-the-art (SOTA) predictive performance in an intra-study cross-validation experiment on the Cancer Genome Atlas colon and rectal cancer dataset (TCGA-CRC-DX). It also demonstrated excellent generalizability in cross-study external validation and delivered a SOTA area under the receiver operating characteristic curve (AUROC) of 0.90 for MSI, using the Molecular and Cellular Oncology dataset for training (N = 1,065) and the TCGA-CRC-DX (N = 462) for testing. A similar performance (AUROC = 0.91) was reported in a recent study, using ~8,000 training samples (ResNet18) on the same testing dataset. Swin-T was extremely efficient when using small training datasets and exhibited robust predictive performance with 200-500 training samples. Our findings indicate that Swin-T could be 5-10 times more efficient than existing algorithms for MSI prediction based on ResNet18 and ShuffleNet. Furthermore, the Swin-T models demonstrated their capability in accurately predicting MSI and BRAF mutation status, which could exclude and therefore reduce samples before subsequent standard testing in a cascading diagnostic workflow, in turn reducing turnaround time and costs.
许多人工智能模型已经被开发出来,用于预测结直肠癌(CRC)的临床相关生物标志物,包括微卫星不稳定性(MSI)。然而,现有的深度学习网络需要大型的训练数据集,而这些数据集往往难以获取。在本研究中,我们基于最新的使用移位窗口的分层视觉转换器(Swin Transformer[Swin-T]),开发了一种有效的工作流程,用于预测结直肠癌的生物标志物(MSI、超突变、染色体不稳定性、CpG 岛甲基化表型以及 BRAF 和 TP53 突变),该方法所需的数据集相对较小。我们的 Swin-T 工作流程在对癌症基因组图谱结肠和直肠癌症数据集(TCGA-CRC-DX)进行的内部研究交叉验证实验中实现了卓越的预测性能。它在跨研究外部验证中也表现出了出色的泛化能力,并在使用训练集(N=1065)和 TCGA-CRC-DX(N=462)的 Molecular and Cellular Oncology 数据集进行测试时,MSI 的接收器操作特征曲线(AUROC)达到了 0.90 的 SOTA。在最近的一项研究中,使用相同的测试数据集,在约 8000 个训练样本(ResNet18)上也报告了类似的性能(AUROC=0.91)。Swin-T 在使用小训练数据集时效率极高,在使用 200-500 个训练样本时表现出了稳健的预测性能。我们的研究结果表明,Swin-T 在基于 ResNet18 和 ShuffleNet 的 MSI 预测方面比现有的算法效率高 5-10 倍。此外,Swin-T 模型在准确预测 MSI 和 BRAF 突变状态方面表现出了卓越的性能,这可以在级联诊断工作流程中排除和减少随后的标准测试样本,从而缩短周转时间和降低成本。