Suppr超能文献

肺腺癌和肺鳞状细胞癌的癌症分类、生物标志物鉴定以及使用重叠特征选择方法的基因表达分析。

Lung adenocarcinoma and lung squamous cell carcinoma cancer classification, biomarker identification, and gene expression analysis using overlapping feature selection methods.

机构信息

California University of Science and Medicine, Colton, CA, USA.

出版信息

Sci Rep. 2021 Jun 25;11(1):13323. doi: 10.1038/s41598-021-92725-8.

Abstract

Lung cancer is one of the deadliest cancers in the world. Two of the most common subtypes, lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), have drastically different biological signatures, yet they are often treated similarly and classified together as non-small cell lung cancer (NSCLC). LUAD and LUSC biomarkers are scarce, and their distinct biological mechanisms have yet to be elucidated. To detect biologically relevant markers, many studies have attempted to improve traditional machine learning algorithms or develop novel algorithms for biomarker discovery. However, few have used overlapping machine learning or feature selection methods for cancer classification, biomarker identification, or gene expression analysis. This study proposes to use overlapping traditional feature selection or feature reduction techniques for cancer classification and biomarker discovery. The genes selected by the overlapping method were then verified using random forest. The classification statistics of the overlapping method were compared to those of the traditional feature selection methods. The identified biomarkers were validated in an external dataset using AUC and ROC analysis. Gene expression analysis was then performed to further investigate biological differences between LUAD and LUSC. Overall, our method achieved classification results comparable to, if not better than, the traditional algorithms. It also identified multiple known biomarkers, and five potentially novel biomarkers with high discriminating values between LUAD and LUSC. Many of the biomarkers also exhibit significant prognostic potential, particularly in LUAD. Our study also unraveled distinct biological pathways between LUAD and LUSC.

摘要

肺癌是世界上最致命的癌症之一。两种最常见的亚型,肺腺癌(LUAD)和肺鳞状细胞癌(LUSC),具有明显不同的生物学特征,但它们通常被类似地治疗,并被归类为非小细胞肺癌(NSCLC)。LUAD 和 LUSC 的生物标志物稀缺,其独特的生物学机制尚未阐明。为了检测具有生物学意义的标志物,许多研究试图改进传统的机器学习算法或开发用于生物标志物发现的新算法。然而,很少有研究使用重叠的机器学习或特征选择方法进行癌症分类、生物标志物识别或基因表达分析。本研究提出使用重叠的传统特征选择或特征降维技术进行癌症分类和生物标志物发现。然后使用随机森林验证重叠方法选择的基因。将重叠方法的分类统计数据与传统特征选择方法的分类统计数据进行比较。使用 AUC 和 ROC 分析在外部数据集上验证鉴定的生物标志物。然后进行基因表达分析,以进一步研究 LUAD 和 LUSC 之间的生物学差异。总的来说,我们的方法实现了与传统算法相当甚至更好的分类结果。它还确定了多个已知的生物标志物,以及五个具有 LUAD 和 LUSC 之间高区分值的潜在新生物标志物。许多生物标志物也表现出显著的预后潜力,特别是在 LUAD 中。我们的研究还揭示了 LUAD 和 LUSC 之间不同的生物学途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9f4e/8233431/59563c817f03/41598_2021_92725_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验