Anand Deepak, Kurian Nikhil Cherian, Dhage Shubham, Kumar Neeraj, Rane Swapnil, Gann Peter H, Sethi Amit
Department of Electrical Engineering, IIT Bombay, Mumbai, Maharashtra, India.
Department of Computing Science, University of Alberta, Edmonton, Canada.
J Pathol Inform. 2020 Jul 24;11:19. doi: 10.4103/jpi.jpi_10_20. eCollection 2020.
Several therapeutically important mutations in cancers are economically detected using immunohistochemistry (IHC), which highlights the overexpression of specific antigens associated with the mutation. However, IHC panels can be imprecise and relatively expensive in low-income settings. On the other hand, although hematoxylin and eosin (H&E) staining used to visualize the general tissue morphology is a routine and low cost, it does not highlight any specific antigen or mutation.
Using the human epidermal growth factor receptor 2 (HER2) mutation in breast cancer as an example, we strengthen the case for cost-effective detection and screening of overexpression of HER2 protein in H&E-stained tissue.
We use computational methods that reliably detect subtle morphological changes associated with the over-expression of mutation-specific proteins directly from H&E images.
We trained a classification pipeline to determine HER2 overexpression status of H&E stained whole slide images. Our training dataset was derived from a single hospital containing 26 (11 HER2+ and 15 HER2-) cases. We tested the classification pipeline on 26 (8 HER2+ and 18 HER2-) held-out cases from the same hospital and 45 independent cases (23 HER2+ and 22 HER2-) from the TCGA-BRCA cohort. The pipeline was composed of a stain separation module and three deep neural network modules in tandem for robustness and interpretability.
We evaluate our trained model through area under the curve (AUC)-receiver operating characteristic.
Our pipeline achieved an AUC of 0.82 (confidence interval [CI]: 0.65-0.98) on held-out cases and an AUC of 0.76 (CI: 0.61-0.89) on the independent dataset from TCGA. We also demonstrate the region-level correspondence of HER2 overexpression between a patient's IHC and H&E serial sections.
Our work strengthens the case for automatically quantifying the overexpression of mutation-specific proteins in H&E-stained digital pathology, and it highlights the importance of multi-stage machine learning pipelines for added robustness and interpretability.
癌症中几种具有治疗重要性的突变可通过免疫组织化学(IHC)进行经济有效的检测,该方法能突出显示与突变相关的特定抗原的过表达。然而,在低收入环境中,免疫组织化学检测组可能不够精确且成本相对较高。另一方面,苏木精和伊红(H&E)染色虽常用于观察一般组织形态,是一种常规且低成本的方法,但它无法突出任何特定抗原或突变。
以乳腺癌中的人表皮生长因子受体2(HER2)突变为例,我们进一步论证了在H&E染色组织中对HER2蛋白过表达进行经济有效检测和筛查的理由。
我们使用计算方法,直接从H&E图像中可靠地检测与突变特异性蛋白过表达相关的细微形态变化。
我们训练了一个分类流程来确定H&E染色的全玻片图像的HER2过表达状态。我们的训练数据集来自一家医院的26个病例(11个HER2阳性和15个HER2阴性)。我们在同一家医院的26个预留病例(8个HER2阳性和18个HER2阴性)以及来自TCGA - BRCA队列的45个独立病例(23个HER2阳性和22个HER2阴性)上测试了该分类流程。该流程由一个染色分离模块和三个深度神经网络模块串联组成,以确保稳健性和可解释性。
我们通过曲线下面积(AUC)-受试者操作特征曲线来评估我们训练的模型。
我们的流程在预留病例上的AUC为0.82(置信区间[CI]:0.65 - 0.98),在来自TCGA的独立数据集上的AUC为0.76(CI:0.61 - 0.89)。我们还展示了患者的免疫组织化学和H&E连续切片之间HER2过表达的区域水平对应关系。
我们的工作进一步论证了在H&E染色的数字病理学中自动量化突变特异性蛋白过表达的理由,并突出了多阶段机器学习流程对于增强稳健性和可解释性的重要性。