Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong, Australia.
BMC Med Genomics. 2020 Feb 24;13(Suppl 3):20. doi: 10.1186/s12920-020-0658-5.
Breast cancer is a collection of multiple tissue pathologies, each with a distinct molecular signature that correlates with patient prognosis and response to therapy. Accurately differentiating between breast cancer sub-types is an important part of clinical decision-making. Although this problem has been addressed using machine learning methods in the past, there remains unexplained heterogeneity within the established sub-types that cannot be resolved by the commonly used classification algorithms.
In this paper, we propose a novel deep learning architecture, called DeepTRIAGE (Deep learning for the TRactable Individualised Analysis of Gene Expression), which uses an attention mechanism to obtain personalised biomarker scores that describe how important each gene is in predicting the cancer sub-type for each sample. We then perform a principal component analysis of these biomarker scores to visualise the sample heterogeneity, and use a linear model to test whether the major principal axes associate with known clinical phenotypes.
Our model not only classifies cancer sub-types with good accuracy, but simultaneously assigns each patient their own set of interpretable and individualised biomarker scores. These personalised scores describe how important each feature is in the classification of any patient, and can be analysed post-hoc to generate new hypotheses about latent heterogeneity.
We apply the DeepTRIAGE framework to classify the gene expression signatures of luminal A and luminal B breast cancer sub-types, and illustrate its use for genes as well as the GO and KEGG gene sets. Using DeepTRIAGE, we calculate personalised biomarker scores that describe the most important features for classifying an individual patient as luminal A or luminal B. In doing so, DeepTRIAGE simultaneously reveals heterogeneity within the luminal A biomarker scores that significantly associate with tumour stage, placing all luminal samples along a continuum of severity.
乳腺癌是多种组织病理学的集合,每种都有独特的分子特征,与患者的预后和对治疗的反应相关。准确区分乳腺癌亚型是临床决策的重要组成部分。尽管过去已经使用机器学习方法解决了这个问题,但在已建立的亚型中仍然存在无法通过常用分类算法解决的未解释的异质性。
在本文中,我们提出了一种新的深度学习架构,称为 DeepTRIAGE(用于可处理的基因表达个体分析的深度学习),它使用注意力机制获得个性化的生物标志物评分,描述每个基因在预测每个样本的癌症亚型方面的重要性。然后,我们对这些生物标志物评分进行主成分分析,以可视化样本异质性,并使用线性模型来测试主要主坐标轴是否与已知的临床表型相关联。
我们的模型不仅可以准确地分类癌症亚型,还同时为每个患者分配自己的一组可解释和个性化的生物标志物评分。这些个性化评分描述了每个特征在任何患者分类中的重要性,并且可以进行事后分析,以生成关于潜在异质性的新假设。
我们将 DeepTRIAGE 框架应用于分类 luminal A 和 luminal B 乳腺癌亚型的基因表达特征,并说明其用于基因以及 GO 和 KEGG 基因集的用途。使用 DeepTRIAGE,我们计算了描述将个体患者分类为 luminal A 或 luminal B 的最重要特征的个性化生物标志物评分。这样,DeepTRIAGE 同时揭示了 luminal A 生物标志物评分中的异质性与肿瘤分期显著相关,将所有 luminal 样本沿着严重程度的连续体排列。