Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, New York.
Paige AI, New York, New York.
Cancer Res. 2024 Oct 15;84(20):3478-3489. doi: 10.1158/0008-5472.CAN-24-1322.
Artificial intelligence (AI) systems can improve cancer diagnosis, yet their development often relies on subjective histologic features as ground truth for training. Herein, we developed an AI model applied to histologic whole-slide images using CDH1 biallelic mutations, pathognomonic for invasive lobular carcinoma (ILC) in breast neoplasms, as ground truth. The model accurately predicted CDH1 biallelic mutations (accuracy = 0.95) and diagnosed ILC (accuracy = 0.96). A total of 74% of samples classified by the AI model as having CDH1 biallelic mutations but lacking these alterations displayed alternative CDH1 inactivating mechanisms, including a deleterious CDH1 fusion gene and noncoding CDH1 genetic alterations. Analysis of internal and external validation cohorts demonstrated 0.95 and 0.89 accuracy for ILC diagnosis, respectively. The latent features of the AI model correlated with human-explainable histopathologic features. Taken together, this study reports the construction of an AI algorithm trained using a genetic rather than histologic ground truth that can robustly classify ILCs and uncover CDH1 inactivating mechanisms, providing the basis for orthogonal ground truth utilization for development of diagnostic AI models applied to whole-slide image. Significance: Genetic alterations linked to strong genotypic-phenotypic correlations can be utilized to develop AI systems applied to pathology that facilitate cancer diagnosis and biologic discoveries.
人工智能(AI)系统可以改善癌症诊断,但它们的开发通常依赖于作为训练ground truth 的主观组织学特征。在此,我们开发了一种应用于组织学全切片图像的 AI 模型,该模型以乳腺癌肿瘤中具有特征性的 CDH1 双等位基因突变作为 ground truth。该模型准确预测了 CDH1 双等位基因突变(准确率=0.95)和诊断出浸润性小叶癌(ILC)(准确率=0.96)。共有 74%的 AI 模型分类为具有 CDH1 双等位基因突变但缺乏这些改变的样本显示出替代的 CDH1 失活机制,包括有害的 CDH1 融合基因和非编码 CDH1 遗传改变。对内部和外部验证队列的分析分别显示出 ILC 诊断的准确率为 0.95 和 0.89。AI 模型的潜在特征与人类可解释的组织病理学特征相关。总之,本研究报告了一种使用遗传而不是组织学 ground truth 训练的 AI 算法的构建,该算法可以可靠地对 ILC 进行分类并揭示 CDH1 失活机制,为开发应用于全切片图像的诊断 AI 模型提供了正交 ground truth 利用的基础。意义:与强基因型-表型相关性相关的遗传改变可用于开发应用于病理学的 AI 系统,以促进癌症诊断和生物学发现。