Suppr超能文献

基于Shapley值加法解释的特征选择揭示CXCL14是预测特发性肺纤维化的关键免疫相关基因。

Shapley additive explanations based feature selection reveals CXCL14 as a key immune-related gene in predicting idiopathic pulmonary fibrosis.

作者信息

Chen Bin, Huan Lu, Lu Junyu, Yuan Jinhe

机构信息

Department of Geriatric Palliative Medicine, Chongqing Liangjiang New District People Hospital, Chongqing, China.

Department of Respiratory and Critical Care Medicine, Renji Hospital, School of Medicine, Chongqing University, Chongqing, China.

出版信息

Front Med (Lausanne). 2025 Aug 6;12:1608078. doi: 10.3389/fmed.2025.1608078. eCollection 2025.

Abstract

BACKGROUND

Idiopathic pulmonary fibrosis (IPF) is a progressive lung disease marked by excessive fibrous tissue accumulation in the lung interstitium, leading to a gradual deterioration of respiratory function and significantly impairing patients' quality of life. Despite advances in understanding its etiology and pathogenesis, the exact mechanisms remain unclear, underscoring the need for novel biomarkers and therapeutic targets.

METHODS

We analyzed five publicly available datasets from the Gene Expression Omnibus (GEO), specifically "GSE15197," "GSE53845," "GSE135065," "GSE185691," and "GSE195770," to identify gene expression changes associated with IPF. Data were annotated and normalized to minimize batch effects and technical variability. Principal Component Analysis (PCA) verified preprocessing efficacy. Differentially expressed genes (DEGs) were identified using linear modeling. Core DEGs were selected via integrative analysis across datasets.

RESULTS

Our analysis revealed DEGs that are substantially linked to crucial biological processes such as extracellular matrix organization and immune response regulation. Integrative analysis of five GEO datasets identified CXCL14, MMP7, and MDK as core differentially expressed genes in the final predictive model. Using Least Absolute Shrinkage and Selection Operator (LASSO) regression and Random Forest, we constructed a logistic regression model with robust predictive performance, achieving an AUC of 0.92 in the training cohort and 0.89 in the validation cohort, with sensitivity of 88% and specificity of 85%. The Shapley Additive Explanations (SHAP) method identified CXCL14 (mean SHAP value = 0.38) as the most influential feature, followed by MMP7 and MDK. Functional enrichment analyses highlighted significant enrichment of TGF- signaling, extracellular matrix organization, and chemokine signaling pathways. Immune infiltration analysis revealed positive correlations between CXCL14 expression and alveolar macrophage/activated fibroblast populations, while SHAP interaction analysis identified synergistic effects between CXCL14 and TGF-β1 in driving fibrosis.

CONCLUSION

These findings substantiate the hypothesis that IPF pathogenesis is closely linked to extracellular matrix remodeling and immune dysregulation. This suggests that future investigations should delve deeper into the practical applications of identified biomarkers in the early diagnosis and management of IPF. Furthermore, the machine learning-based predictive model demonstrates strong clinical potential and merits further validation in prospective trials to assess its utility and therapeutic implications in real-world settings.

摘要

背景

特发性肺纤维化(IPF)是一种进行性肺部疾病,其特征是肺间质中纤维组织过度积聚,导致呼吸功能逐渐恶化,并严重损害患者的生活质量。尽管在理解其病因和发病机制方面取得了进展,但确切机制仍不清楚,这凸显了对新型生物标志物和治疗靶点的需求。

方法

我们分析了来自基因表达综合数据库(GEO)的五个公开可用数据集,具体为“GSE15197”、“GSE53845”、“GSE135065”、“GSE185691”和“GSE195770”,以确定与IPF相关的基因表达变化。对数据进行注释和标准化,以尽量减少批次效应和技术变异性。主成分分析(PCA)验证了预处理效果。使用线性模型识别差异表达基因(DEG)。通过跨数据集的综合分析选择核心DEG。

结果

我们的分析揭示了与细胞外基质组织和免疫反应调节等关键生物学过程密切相关的DEG。对五个GEO数据集的综合分析确定CXCL14、MMP7和MDK为最终预测模型中的核心差异表达基因。使用最小绝对收缩和选择算子(LASSO)回归和随机森林,我们构建了一个具有强大预测性能的逻辑回归模型,在训练队列中的AUC为0.92,在验证队列中的AUC为0.89,敏感性为88%,特异性为85%。Shapley加性解释(SHAP)方法确定CXCL14(平均SHAP值=0.38)为最具影响力的特征,其次是MMP7和MDK。功能富集分析突出了TGF-信号传导、细胞外基质组织和趋化因子信号通路的显著富集。免疫浸润分析显示CXCL14表达与肺泡巨噬细胞/活化成纤维细胞群体之间呈正相关,而SHAP相互作用分析确定CXCL14和TGF-β1在驱动纤维化方面具有协同作用。

结论

这些发现证实了IPF发病机制与细胞外基质重塑和免疫失调密切相关的假设。这表明未来的研究应更深入地探讨已识别生物标志物在IPF早期诊断和管理中的实际应用。此外,基于机器学习的预测模型显示出强大的临床潜力,值得在前瞻性试验中进一步验证,以评估其在实际临床环境中的效用和治疗意义。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6cab/12364669/0e81d92a7dbe/fmed-12-1608078-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验