Suppr超能文献

机器学习揭示CAT基因是一种非小细胞肺癌中新型潜在的诊断和预后生物标志物。

Machine learning reveals CAT gene as a novel potential diagnostic and prognostic biomarker in non-small cell lung cancer.

作者信息

Tian Yi, Zhao Wen-Ya, Liu Yi-Ru, Song Wen-Wen, Lin Qiao-Xin, Gong Yan-Na, Deng Yi-Ting, Gu Dian-Na, Tian Ling

机构信息

Department of Central Laboratory, Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200030, China.

Department of Medical Oncology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.

出版信息

Discov Oncol. 2024 Dec 18;15(1):774. doi: 10.1007/s12672-024-01670-1.

Abstract

BACKGROUND

Non-small cell lung cancer (NSCLC) represents one of the most prevalent forms of lung cancer, with a five-year survival rate of 21.7%. There is an urgent need to identify pertinent biomarkers to inform the diagnosis and prognosis of tumors, particularly those that can be applied to different age groups. Herein, we would apply machine learning methods to specifically analyze the issue of biomarker applicability across different age groups in NSCLC.

METHODS

Studies have shown a higher incidence of NSCLC in people over 40 years of age, and due to the limitations of data set, studies of individuals under 40 years of age were not included in this study. To simulate the human aging model as closely as possible, we gathered corresponding non-small cell lung cancer (NSCLC) samples from the UCSC Xena database based on patient age information. These samples were then categorized into three groups: 40-60, 60-80, and over 80 years old. Subsequently, we employed four machine learning methods-Random Forest, LASSO regression analysis, XGBoost, and GBM-to identify gene sets with significant diagnostic value for each age group. By taking the intersection of these sets, we identified the optimal gene and assessed its prognostic significance in NSCLC. Then, the diagnostic value of CAT gene was validated using global public databases, including the GSE32863, GSE43458, GSE68571, GSE10072, and GSE63459 datasets from the Americas, the GSE30219 and GSE102511 datasets from Europe, and the GSE31210 and GSE19804 datasets from Asia. Furthermore, immunohistochemical staining was performed in an independent cohort from a tissue microarray. Additionally, cell culture and RT-qPCR were employed for external validation.

RESULTS

Through the implementation of machine learning methods, we successfully identified the catalase (CAT) gene. Our analysis revealed that individuals with high expression of the CAT gene experienced improved survival rates. Additionally, these individuals exhibited elevated immune scores. We further discovered that the CAT gene synergizes with multiple components of neutrophils, including TLRs, FcRn, and the selective GEF of Rho-family GTPases. In addition, we identified a potential immune checkpoint, TNFSF15, which is applicable to the human aging model. Finally, we validated the CAT gene's diagnostic value using databases encompassing the Americas, Europe, and Asia regions. Through external RT-qPCR validation, we verified that CAT expression in BEAS-2B was higher than that of A549. In an independent human cohort, we also verified that CAT is lowly expressed in lung cancer tissues. In addition, higher CAT levels were associated with improved survival in the 40-60 and 60-80 age groups.

CONCLUSIONS

In our analysis of the NSCLC database, we pinpointed the CAT gene, which holds promise for potential diagnostic and prognostic applications in the context of human aging. Furthermore, it may offer insights into addressing age-related heterogeneity of NSCLC.

摘要

背景

非小细胞肺癌(NSCLC)是最常见的肺癌形式之一,五年生存率为21.7%。迫切需要确定相关生物标志物以指导肿瘤的诊断和预后,特别是那些可应用于不同年龄组的生物标志物。在此,我们将应用机器学习方法专门分析NSCLC中不同年龄组生物标志物的适用性问题。

方法

研究表明40岁以上人群NSCLC发病率较高,由于数据集的局限性,本研究未纳入40岁以下个体的研究。为尽可能模拟人类衰老模型,我们根据患者年龄信息从UCSC Xena数据库收集了相应的非小细胞肺癌(NSCLC)样本。这些样本随后被分为三组:40 - 60岁、60 - 80岁和80岁以上。随后,我们采用四种机器学习方法——随机森林、LASSO回归分析、XGBoost和GBM——来识别对每个年龄组具有显著诊断价值的基因集。通过取这些集合的交集,我们确定了最佳基因并评估其在NSCLC中的预后意义。然后,使用全球公共数据库验证CAT基因的诊断价值,包括来自美洲的GSE32863、GSE43458、GSE68571、GSE10072和GSE63459数据集,来自欧洲的GSE30219和GSE102511数据集,以及来自亚洲的GSE31210和GSE19804数据集。此外,在来自组织微阵列的独立队列中进行免疫组织化学染色。另外,采用细胞培养和RT-qPCR进行外部验证。

结果

通过实施机器学习方法,我们成功鉴定出过氧化氢酶(CAT)基因。我们的分析表明,CAT基因高表达的个体生存率提高。此外,这些个体的免疫评分升高。我们进一步发现,CAT基因与中性粒细胞的多种成分协同作用,包括Toll样受体(TLRs)、新生儿Fc受体(FcRn)和Rho家族GTP酶的选择性鸟嘌呤核苷酸交换因子(GEF)。此外,我们确定了一个潜在的免疫检查点TNFSF15,其适用于人类衰老模型。最后,我们使用涵盖美洲、欧洲和亚洲地区的数据库验证了CAT基因的诊断价值。通过外部RT-qPCR验证,我们证实BEAS-2B中CAT的表达高于A549。在一个独立的人类队列中,我们还证实CAT在肺癌组织中低表达。此外,在40 - 60岁和60 - 80岁年龄组中,较高的CAT水平与生存率提高相关。

结论

在我们对NSCLC数据库的分析中,我们确定了CAT基因,它在人类衰老背景下具有潜在的诊断和预后应用前景。此外,它可能为解决NSCLC的年龄相关异质性提供见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/540f/11655766/af672c436bc3/12672_2024_1670_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验