Srivastava Arunima, Kulkarni Chaitanya, Huang Kun, Parwani Anil, Mallick Parag, Machiraju Raghu
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA.
School of Medicine, Indiana University, Indianapolis, IN, USA.
Biomed Inform Insights. 2018 Oct 31;10:1178222618807481. doi: 10.1177/1178222618807481. eCollection 2018.
Convolutional neural networks (CNNs) have gained steady popularity as a tool to perform automatic classification of whole slide histology images. While CNNs have proven to be powerful classifiers in this context, they fail to explain this classification, as the network engineered features used for modeling and classification are ONLY interpretable by the CNNs themselves. This work aims at enhancing a traditional neural network model to perform histology image modeling, patient classification, and interpretation of the distinctive features identified by the network within the histology whole slide images (WSIs). We synthesize a workflow which (a) intelligently samples the training data by automatically selecting only image areas that display visible disease-relevant tissue state and (b) isolates regions most pertinent to the trained CNN prediction and translates them to observable and qualitative features such as color, intensity, cell and tissue morphology and texture. We use the Cancer Genome Atlas's Breast Invasive Carcinoma (TCGA-BRCA) histology dataset to build a model predicting patient attributes (disease stage and node status) and the tumor proliferation challenge (TUPAC 2016) breast cancer histology image repository to help identify disease-relevant tissue state (mitotic activity). We find that our enhanced CNN based workflow both increased patient attribute predictive accuracy (~2% increase for disease stage and ~10% increase for node status) and experimentally proved that a data-driven CNN histology model predicting breast invasive carcinoma stages is highly sensitive to features such as color, cell size, and shape, granularity, and uniformity. This work summarizes the need for understanding the widely trusted models built using deep learning and adds a layer of biological context to a technique that functioned as a classification only approach till now.
卷积神经网络(CNN)作为一种对全切片组织学图像进行自动分类的工具,已逐渐受到广泛欢迎。虽然CNN在这种情况下已被证明是强大的分类器,但它们无法解释这种分类,因为用于建模和分类的网络工程特征只能由CNN本身解释。这项工作旨在增强传统神经网络模型,以执行组织学图像建模、患者分类,并解释网络在组织学全切片图像(WSI)中识别出的独特特征。我们合成了一个工作流程,该流程(a)通过自动仅选择显示可见疾病相关组织状态的图像区域来智能地对训练数据进行采样,以及(b)分离与训练后的CNN预测最相关的区域,并将它们转化为可观察和定性的特征,如颜色、强度、细胞和组织形态及纹理。我们使用癌症基因组图谱的乳腺浸润性癌(TCGA - BRCA)组织学数据集来构建一个预测患者属性(疾病阶段和淋巴结状态)的模型,并使用肿瘤增殖挑战(TUPAC 2016)乳腺癌组织学图像库来帮助识别疾病相关组织状态(有丝分裂活性)。我们发现,基于增强型CNN的工作流程既提高了患者属性预测准确性(疾病阶段提高约2%,淋巴结状态提高约10%),并通过实验证明,预测乳腺浸润性癌阶段的数据驱动CNN组织学模型对颜色、细胞大小和形状、粒度和均匀性等特征高度敏感。这项工作总结了理解使用深度学习构建的广泛受信任模型的必要性,并为一种迄今为止仅作为分类方法的技术增加了一层生物学背景。