Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, UK.
Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.
Mol Syst Biol. 2020 Mar;16(3):e9083. doi: 10.15252/msb.20199083.
Characterising context-dependent gene functions is crucial for understanding the genetic bases of health and disease. To date, inference of gene functions from large-scale genetic perturbation screens is based on ad hoc analysis pipelines involving unsupervised clustering and functional enrichment. We present Knowledge- and Context-driven Machine Learning (KCML), a framework that systematically predicts multiple context-specific functions for a given gene based on the similarity of its perturbation phenotype to those with known function. As a proof of concept, we test KCML on three datasets describing phenotypes at the molecular, cellular and population levels and show that it outperforms traditional analysis pipelines. In particular, KCML identified an abnormal multicellular organisation phenotype associated with the depletion of olfactory receptors, and TGFβ and WNT signalling genes in colorectal cancer cells. We validate these predictions in colorectal cancer patients and show that olfactory receptors expression is predictive of worse patient outcomes. These results highlight KCML as a systematic framework for discovering novel scale-crossing and context-dependent gene functions. KCML is highly generalisable and applicable to various large-scale genetic perturbation screens.
从大规模遗传干扰筛选中推断基因功能,关键在于了解健康和疾病的遗传基础。迄今为止,这种推断主要依赖于特定于分析流程的无监督聚类和功能富集。我们提出了知识和上下文驱动的机器学习(KCML),这是一种基于干扰表型与已知功能基因相似性,为给定基因系统地预测多种特定上下文功能的框架。作为概念验证,我们在三个描述分子、细胞和群体水平表型的数据集上测试了 KCML,并表明它优于传统的分析流程。特别是,KCML 鉴定出一种与嗅觉受体耗竭、结直肠癌细胞中 TGFβ 和 WNT 信号基因相关的异常多细胞组织表型。我们在结直肠癌患者中验证了这些预测,并表明嗅觉受体的表达与患者预后不良相关。这些结果突出了 KCML 作为一种发现新的跨尺度和上下文相关基因功能的系统框架。KCML 具有高度的通用性,适用于各种大规模遗传干扰筛选。