基于基因表达数据的机器学习方法预测狼疮疾病活动度。

Machine learning approaches to predict lupus disease activity from gene expression data.

机构信息

RILITE Research Institute and AMPEL BioSolutions, 250 W Main St, Ste 300, Charlottesville, VA, 22902, USA.

Department of Physics, George Washington University, Washington, DC, 20052, USA.

出版信息

Sci Rep. 2019 Jul 3;9(1):9617. doi: 10.1038/s41598-019-45989-0.

DOI:10.1038/s41598-019-45989-0

PMID:31270349

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6610624/

Abstract

The integration of gene expression data to predict systemic lupus erythematosus (SLE) disease activity is a significant challenge because of the high degree of heterogeneity among patients and study cohorts, especially those collected on different microarray platforms. Here we deployed machine learning approaches to integrate gene expression data from three SLE data sets and used it to classify patients as having active or inactive disease as characterized by standard clinical composite outcome measures. Both raw whole blood gene expression data and informative gene modules generated by Weighted Gene Co-expression Network Analysis from purified leukocyte populations were employed with various classification algorithms. Classifiers were evaluated by 10-fold cross-validation across three combined data sets or by training and testing in independent data sets, the latter of which amplified the effects of technical variation. A random forest classifier achieved a peak classification accuracy of 83 percent under 10-fold cross-validation, but its performance could be severely affected by technical variation among data sets. The use of gene modules rather than raw gene expression was more robust, achieving classification accuracies of approximately 70 percent regardless of how the training and testing sets were formed. Fine-tuning the algorithms and parameter sets may generate sufficient accuracy to be informative as a standalone estimate of disease activity.

摘要

整合基因表达数据以预测系统性红斑狼疮 (SLE) 疾病活动度是一项重大挑战，因为患者和研究队列之间存在高度异质性，尤其是在不同的微阵列平台上收集的那些。在这里，我们部署了机器学习方法来整合来自三个 SLE 数据集的基因表达数据，并使用它来根据标准临床综合结局衡量标准将患者分类为有活性或无活性疾病。我们使用各种分类算法，分别使用原始全血基因表达数据和从纯化白细胞群体中生成的有信息的基因模块。通过在三个合并数据集之间进行 10 折交叉验证或在独立数据集上进行训练和测试来评估分类器，后者放大了技术变异的影响。随机森林分类器在 10 折交叉验证下达到了 83%的峰值分类准确性，但它的性能可能会受到数据集之间技术变异的严重影响。使用基因模块而不是原始基因表达更稳健，无论如何形成训练和测试集，都能实现约 70%的分类准确性。微调算法和参数集可能会产生足够的准确性，作为疾病活动的独立估计具有信息性。

相似文献

Machine learning approaches to predict lupus disease activity from gene expression data.基于基因表达数据的机器学习方法预测狼疮疾病活动度。

Sci Rep. 2019 Jul 3;9(1):9617. doi: 10.1038/s41598-019-45989-0.

Identification of alterations in macrophage activation associated with disease activity in systemic lupus erythematosus.鉴定与系统性红斑狼疮疾病活动相关的巨噬细胞激活改变。

PLoS One. 2018 Dec 18;13(12):e0208132. doi: 10.1371/journal.pone.0208132. eCollection 2018.

Analysis of transcriptomic features reveals molecular endotypes of SLE with clinical implications.分析转录组特征揭示了具有临床意义的 SLE 的分子内型。

Genome Med. 2023 Oct 16;15(1):84. doi: 10.1186/s13073-023-01237-9.

Full high-throughput sequencing analysis of differences in expression profiles of long noncoding RNAs and their mechanisms of action in systemic lupus erythematosus.系统性红斑狼疮中长链非编码 RNA 表达谱差异的全高通量测序分析及其作用机制。

Arthritis Res Ther. 2019 Mar 5;21(1):70. doi: 10.1186/s13075-019-1853-7.

Machine learning models to predict the progression from early to late stages of papillary renal cell carcinoma.机器学习模型预测肾乳头状细胞癌早期到晚期的进展。

Comput Biol Med. 2018 Sep 1;100:92-99. doi: 10.1016/j.compbiomed.2018.06.030. Epub 2018 Jun 28.

Meta-analysis of microarray data using a pathway-based approach identifies a 37-gene expression signature for systemic lupus erythematosus in human peripheral blood mononuclear cells.基于通路的微阵列数据分析方法的荟萃分析确定了人类外周血单核细胞中系统性红斑狼疮的 37 个基因表达特征。

BMC Med. 2011 May 30;9:65. doi: 10.1186/1741-7015-9-65.

Can survival prediction be improved by merging gene expression data sets?能否通过合并基因表达数据集来提高生存预测？

PLoS One. 2009 Oct 23;4(10):e7431. doi: 10.1371/journal.pone.0007431.

Identification of circular RNAs hsa_circ_0044235 and hsa_circ_0068367 as novel biomarkers for systemic lupus erythematosus.鉴定环状 RNA hsa_circ_0044235 和 hsa_circ_0068367 作为系统性红斑狼疮的新型生物标志物。

Int J Mol Med. 2019 Oct;44(4):1462-1472. doi: 10.3892/ijmm.2019.4302. Epub 2019 Aug 5.

Key genes and functional coexpression modules involved in the pathogenesis of systemic lupus erythematosus.系统性红斑狼疮发病机制中的关键基因和功能共表达模块。

J Cell Physiol. 2018 Nov;233(11):8815-8825. doi: 10.1002/jcp.26795. Epub 2018 May 28.

Interpretable machine learning identifies paediatric Systemic Lupus Erythematosus subtypes based on gene expression data.基于基因表达数据的可解释机器学习识别儿科系统性红斑狼疮亚型。

Sci Rep. 2022 May 6;12(1):7433. doi: 10.1038/s41598-022-10853-1.

引用本文的文献

Generative prediction of causal gene sets responsible for complex traits.对负责复杂性状的因果基因集进行生成式预测。

Proc Natl Acad Sci U S A. 2025 Jun 17;122(24):e2415071122. doi: 10.1073/pnas.2415071122. Epub 2025 Jun 12.

A serum biomarker panel and miniarray detection system for tracking disease activity and flare risk in lupus nephritis.一种用于追踪狼疮性肾炎疾病活动和复发风险的血清生物标志物面板及微阵列检测系统。

Front Immunol. 2025 May 1;16:1541907. doi: 10.3389/fimmu.2025.1541907. eCollection 2025.

Application of machine learning in assessing disease activity in SLE.机器学习在评估系统性红斑狼疮疾病活动度中的应用。

Lupus Sci Med. 2025 Apr 8;12(1):e001456. doi: 10.1136/lupus-2024-001456.

Multimodal AI/ML for discovering novel biomarkers and predicting disease using multi-omics profiles of patients with cardiovascular diseases.多模态人工智能/机器学习用于发现新的生物标志物并预测心血管疾病患者的多组学特征。

Sci Rep. 2024 Nov 3;14(1):26503. doi: 10.1038/s41598-024-78553-6.

Screening and validating genes associated with cuproptosis in systemic lupus erythematosus by expression profiling combined with machine learning.通过表达谱分析结合机器学习筛选和验证系统性红斑狼疮中与铜死亡相关的基因

Biomol Biomed. 2025 Mar 7;25(4):965-975. doi: 10.17305/bb.2024.10996.

Emerging Strategies in Drug Development and Clinical Care in the Era of Personalized and Precision Medicine.个性化与精准医学时代药物研发与临床护理的新兴策略

Pharmaceutics. 2024 Aug 22;16(8):1107. doi: 10.3390/pharmaceutics16081107.

Systemic lupus in the era of machine learning medicine.机器学习医学时代的系统性红斑狼疮。

Lupus Sci Med. 2024 Mar 4;11(1):e001140. doi: 10.1136/lupus-2023-001140.

Novel multiclass classification machine learning approach for the early-stage classification of systemic autoimmune rheumatic diseases.新型多类别分类机器学习方法用于系统性自身免疫性风湿病的早期分类。

Lupus Sci Med. 2024 Jan 31;11(1):e001125. doi: 10.1136/lupus-2023-001125.

Discovering biomarkers associated and predicting cardiovascular disease with high accuracy using a novel nexus of machine learning techniques for precision medicine.利用机器学习技术的新型融合，准确发现与心血管疾病相关的生物标志物并进行预测，为精准医疗提供支持。

Sci Rep. 2024 Jan 2;14(1):1. doi: 10.1038/s41598-023-50600-8.

IntelliGenes: a novel machine learning pipeline for biomarker discovery and predictive analysis using multi-genomic profiles.IntelliGenes：一种利用多基因组图谱进行生物标志物发现和预测分析的新型机器学习管道。

Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad755.

本文引用的文献

Gene expression analysis delineates the potential roles of multiple interferons in systemic lupus erythematosus.基因表达分析描绘了多种干扰素在系统性红斑狼疮中的潜在作用。

Commun Biol. 2019 Apr 23;2:140. doi: 10.1038/s42003-019-0382-x. eCollection 2019.

Genomic Identification of Low-Density Granulocytes and Analysis of Their Role in the Pathogenesis of Systemic Lupus Erythematosus.低浓度粒细胞的基因组鉴定及其在系统性红斑狼疮发病机制中的作用分析。

J Immunol. 2019 Jun 1;202(11):3309-3317. doi: 10.4049/jimmunol.1801512. Epub 2019 Apr 24.

Identification of alterations in macrophage activation associated with disease activity in systemic lupus erythematosus.鉴定与系统性红斑狼疮疾病活动相关的巨噬细胞激活改变。

PLoS One. 2018 Dec 18;13(12):e0208132. doi: 10.1371/journal.pone.0208132. eCollection 2018.

Neutrophil subsets and their gene signature associate with vascular inflammation and coronary atherosclerosis in lupus.中性粒细胞亚群及其基因特征与狼疮患者的血管炎症和冠状动脉粥样硬化相关。

JCI Insight. 2018 Apr 19;3(8). doi: 10.1172/jci.insight.99276.

Role of interferons in SLE.干扰素在系统性红斑狼疮中的作用。

Best Pract Res Clin Rheumatol. 2017 Jun;31(3):415-428. doi: 10.1016/j.berh.2017.10.003. Epub 2017 Nov 3.

Abnormal B Cell Development in Systemic Lupus Erythematosus: What the Genetics Tell Us.系统性红斑狼疮中异常的B细胞发育：遗传学告诉我们什么。

Arthritis Rheumatol. 2018 Apr;70(4):496-507. doi: 10.1002/art.40396. Epub 2018 Feb 22.

Novel risk genes for systemic lupus erythematosus predicted by random forest classification.随机森林分类预测系统性红斑狼疮的新风险基因。

Sci Rep. 2017 Jul 24;7(1):6236. doi: 10.1038/s41598-017-06516-1.

Correlation between RNA-Seq and microarrays results using TCGA data.使用TCGA数据的RNA测序与微阵列结果之间的相关性。

Gene. 2017 Sep 10;628:200-204. doi: 10.1016/j.gene.2017.07.056. Epub 2017 Jul 20.

RNA sequencing and transcriptome arrays analyses show opposing results for alternative splicing in patient derived samples.RNA测序和转录组阵列分析显示，在患者来源的样本中，可变剪接的结果相反。

BMC Genomics. 2017 Jun 6;18(1):443. doi: 10.1186/s12864-017-3819-y.

Modular transcriptional repertoire analyses identify a blood neutrophil signature as a candidate biomarker for lupus nephritis.模块化转录组分析确定血液中性粒细胞特征作为狼疮性肾炎的候选生物标志物。

Rheumatology (Oxford). 2017 Mar 1;56(3):477-487. doi: 10.1093/rheumatology/kew439.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于基因表达数据的机器学习方法预测狼疮疾病活动度。

Machine learning approaches to predict lupus disease activity from gene expression data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献