Department of Critical Care Medicine, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, 510120, China.
Guangzhou Institute of Respiratory Health, Guangzhou, 510120, China.
J Transl Med. 2023 Sep 12;21(1):620. doi: 10.1186/s12967-023-04499-4.
A significant proportion of septic patients with acute lung injury (ALI) are recognized late due to the absence of an efficient diagnostic test, leading to the postponed treatments and consequently higher mortality. Identifying diagnostic biomarkers may improve screening to identify septic patients at high risk of ALI earlier and provide the potential effective therapeutic drugs. Machine learning represents a powerful approach for making sense of complex gene expression data to find robust ALI diagnostic biomarkers.
The datasets were obtained from GEO and ArrayExpress databases. Following quality control and normalization, the datasets (GSE66890, GSE10474 and GSE32707) were merged as the training set, and four machine learning feature selection methods (Elastic net, SVM, random forest and XGBoost) were applied to construct the diagnostic model. The other datasets were considered as the validation sets. To further evaluate the performance and predictive value of diagnostic model, nomogram, Decision Curve Analysis (DCA) and Clinical Impact Curve (CIC) were constructed. Finally, the potential small molecular compounds interacting with selected features were explored from the CTD database.
The results of GSEA showed that immune response and metabolism might play an important role in the pathogenesis of sepsis-induced ALI. Then, 52 genes were identified as putative biomarkers by consensus feature selection from all four methods. Among them, 5 genes (ARHGDIB, ALDH1A1, TACR3, TREM1 and PI3) were selected by all methods and used to predict ALI diagnosis with high accuracy. The external datasets (E-MTAB-5273 and E-MTAB-5274) demonstrated that the diagnostic model had great accuracy with AUC value of 0.725 and 0.833, respectively. In addition, the nomogram, DCA and CIC showed that the diagnostic model had great performance and predictive value. Finally, the small molecular compounds (Curcumin, Tretinoin, Acetaminophen, Estradiol and Dexamethasone) were screened as the potential therapeutic agents for sepsis-induced ALI.
This consensus of multiple machine learning algorithms identified 5 genes that were able to distinguish ALI from septic patients. The diagnostic model could identify septic patients at high risk of ALI, and provide potential therapeutic targets for sepsis-induced ALI.
由于缺乏有效的诊断测试,很大一部分患有急性肺损伤(ALI)的脓毒症患者被发现较晚,导致治疗推迟,进而死亡率更高。鉴定诊断生物标志物可以改善筛查,以便更早地识别出有发生 ALI 风险的脓毒症患者,并提供潜在的有效治疗药物。机器学习代表了一种从复杂基因表达数据中获取意义的强大方法,用于寻找稳健的 ALI 诊断生物标志物。
数据集来自 GEO 和 ArrayExpress 数据库。经过质量控制和标准化后,数据集(GSE66890、GSE10474 和 GSE32707)被合并作为训练集,然后应用四种机器学习特征选择方法(弹性网络、SVM、随机森林和 XGBoost)来构建诊断模型。其他数据集被视为验证集。为了进一步评估诊断模型的性能和预测价值,构建了列线图、决策曲线分析(DCA)和临床影响曲线(CIC)。最后,从 CTD 数据库中探索了与选定特征相互作用的潜在小分子化合物。
GSEA 的结果表明,免疫反应和代谢可能在脓毒症诱导的 ALI 发病机制中发挥重要作用。然后,通过所有四种方法的共识特征选择,确定了 52 个潜在的生物标志物基因。其中,5 个基因(ARHGDIB、ALDH1A1、TACR3、TREM1 和 PI3)被所有方法选中,用于预测 ALI 诊断,具有很高的准确性。外部数据集(E-MTAB-5273 和 E-MTAB-5274)表明,诊断模型具有很高的准确性,AUC 值分别为 0.725 和 0.833。此外,列线图、DCA 和 CIC 表明,该诊断模型具有出色的性能和预测价值。最后,筛选出小分子化合物(姜黄素、维 A 酸、对乙酰氨基酚、雌二醇和地塞米松)作为脓毒症诱导的 ALI 的潜在治疗药物。
这一共识由多种机器学习算法确定了 5 个能够区分 ALI 与脓毒症患者的基因。该诊断模型可以识别出有发生 ALI 风险的脓毒症患者,并为脓毒症诱导的 ALI 提供潜在的治疗靶点。