Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA.
Department of Electrical Engineering, Stanford University, Stanford, California, USA.
J Am Med Inform Assoc. 2020 May 1;27(5):757-769. doi: 10.1093/jamia/ocz230.
Non-small cell lung cancer is a leading cause of cancer death worldwide, and histopathological evaluation plays the primary role in its diagnosis. However, the morphological patterns associated with the molecular subtypes have not been systematically studied. To bridge this gap, we developed a quantitative histopathology analytic framework to identify the types and gene expression subtypes of non-small cell lung cancer objectively.
We processed whole-slide histopathology images of lung adenocarcinoma (n = 427) and lung squamous cell carcinoma patients (n = 457) in the Cancer Genome Atlas. We built convolutional neural networks to classify histopathology images, evaluated their performance by the areas under the receiver-operating characteristic curves (AUCs), and validated the results in an independent cohort (n = 125).
To establish neural networks for quantitative image analyses, we first built convolutional neural network models to identify tumor regions from adjacent dense benign tissues (AUCs > 0.935) and recapitulated expert pathologists' diagnosis (AUCs > 0.877), with the results validated in an independent cohort (AUCs = 0.726-0.864). We further demonstrated that quantitative histopathology morphology features identified the major transcriptomic subtypes of both adenocarcinoma and squamous cell carcinoma (P < .01).
Our study is the first to classify the transcriptomic subtypes of non-small cell lung cancer using fully automated machine learning methods. Our approach does not rely on prior pathology knowledge and can discover novel clinically relevant histopathology patterns objectively. The developed procedure is generalizable to other tumor types or diseases.
非小细胞肺癌是全球癌症死亡的主要原因,组织病理学评估在其诊断中起着主要作用。然而,与分子亚型相关的形态模式尚未得到系统研究。为了弥补这一空白,我们开发了一种定量组织病理学分析框架,以客观地识别非小细胞肺癌的类型和基因表达亚型。
我们处理了癌症基因组图谱中肺腺癌(n=427)和肺鳞癌患者(n=457)的全切片组织病理学图像。我们构建了卷积神经网络来对组织病理学图像进行分类,通过接收者操作特征曲线下的面积(AUCs)评估其性能,并在独立队列(n=125)中验证结果。
为了建立用于定量图像分析的神经网络,我们首先构建了卷积神经网络模型,以从相邻密集良性组织中识别肿瘤区域(AUCs>0.935)并再现专家病理学家的诊断(AUCs>0.877),在独立队列中验证结果(AUCs=0.726-0.864)。我们进一步证明,定量组织病理学形态特征可识别腺癌和鳞癌的主要转录组亚型(P<0.01)。
本研究首次使用全自动机器学习方法对非小细胞肺癌的转录组亚型进行分类。我们的方法不依赖于先前的病理学知识,可以客观地发现新的具有临床相关性的组织病理学模式。所开发的程序可推广到其他肿瘤类型或疾病。