Leventhal Emily L, Daamen Andrea R, Grammer Amrie C, Lipsky Peter E
AMPEL BioSolutions LLC, and the RILITE Research Institute, Charlottesville, VA 22902, USA.
iScience. 2023 Sep 25;26(10):108042. doi: 10.1016/j.isci.2023.108042. eCollection 2023 Oct 20.
Machine learning (ML) has the potential to identify subsets of patients with distinct phenotypes from gene expression data. However, phenotype prediction using ML has often relied on identifying important genes without a systems biology context. To address this, we created an interpretable ML approach based on blood transcriptomics to predict phenotype in systemic lupus erythematosus (SLE), a heterogeneous autoimmune disease. We employed a sequential grouped feature importance algorithm to assess the performance of gene sets, including immune and metabolic pathways and cell types, known to be abnormal in SLE in predicting disease activity and organ involvement. Gene sets related to interferon, tumor necrosis factor, the mitoribosome, and T cell activation were the best predictors of phenotype with excellent performance. These results suggest potential relationships between the molecular pathways identified in each model and manifestations of SLE. This ML approach to phenotype prediction can be applied to other diseases and tissues.
机器学习(ML)有潜力从基因表达数据中识别出具有不同表型的患者亚组。然而,使用ML进行表型预测通常依赖于在缺乏系统生物学背景的情况下识别重要基因。为了解决这个问题,我们基于血液转录组学创建了一种可解释的ML方法,以预测系统性红斑狼疮(SLE)这一异质性自身免疫性疾病的表型。我们采用了一种顺序分组特征重要性算法来评估基因集的性能,这些基因集包括已知在SLE中异常的免疫和代谢途径以及细胞类型,用于预测疾病活动和器官受累情况。与干扰素、肿瘤坏死因子、线粒体核糖体和T细胞活化相关的基因集是表型的最佳预测指标,表现出色。这些结果表明每个模型中确定的分子途径与SLE的表现之间存在潜在关系。这种用于表型预测的ML方法可应用于其他疾病和组织。