Chidester Benjamin, Do Minh N, Ma Jian
Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA,
Pac Symp Biocomput. 2018;23:319-330.
Connecting genotypes to image phenotypes is crucial for a comprehensive understanding of cancer. To learn such connections, new machine learning approaches must be developed for the better integration of imaging and genomic data. Here we propose a novel approach called Discriminative Bag-of-Cells (DBC) for predicting genomic markers using imaging features, which addresses the challenge of summarizing histopathological images by representing cells with learned discriminative types, or codewords. We also developed a reliable and efficient patch-based nuclear segmentation scheme using convolutional neural networks from which nuclear and cellular features are extracted. Applying DBC on TCGA breast cancer samples to predict basal subtype status yielded a class-balanced accuracy of 70% on a separate test partition of 213 patients. As data sets of imaging and genomic data become increasingly available, we believe DBC will be a useful approach for screening histopathological images for genomic markers. Source code of nuclear segmentation and DBC are available at: https://github.com/bchidest/DBC.
将基因型与图像表型联系起来对于全面理解癌症至关重要。为了了解这种联系,必须开发新的机器学习方法,以便更好地整合成像和基因组数据。在此,我们提出了一种名为“判别细胞袋”(DBC)的新方法,用于使用成像特征预测基因组标记,该方法通过用学习到的判别类型(即码字)表示细胞来应对总结组织病理学图像的挑战。我们还使用卷积神经网络开发了一种可靠且高效的基于补丁的细胞核分割方案,从中提取细胞核和细胞特征。将DBC应用于TCGA乳腺癌样本以预测基底亚型状态,在213名患者的单独测试分区上产生了70%的类平衡准确率。随着成像和基因组数据的数据集越来越多,我们相信DBC将成为筛选组织病理学图像以寻找基因组标记的有用方法。细胞核分割和DBC的源代码可在以下网址获取:https://github.com/bchidest/DBC 。