Suppr超能文献

基于机器学习的多组学生物标志物数据融合分析用于非小细胞肺癌亚组鉴定。

Machine learning based combination of multi-omics data for subgroup identification in non-small cell lung cancer.

机构信息

Department of Electrical Engineering, Indian Institute of Technology Dharwad, Dharwad, India.

Department of Biosciences and Bioengineering, Indian Institute of Technology Dharwad, Dharwad, India.

出版信息

Sci Rep. 2023 Mar 21;13(1):4636. doi: 10.1038/s41598-023-31426-w.

Abstract

Non-small Cell Lung Cancer (NSCLC) is a heterogeneous disease with a poor prognosis. Identifying novel subtypes in cancer can help classify patients with similar molecular and clinical phenotypes. This work proposes an end-to-end pipeline for subgroup identification in NSCLC. Here, we used a machine learning (ML) based approach to compress the multi-omics NSCLC data to a lower dimensional space. This data is subjected to consensus K-means clustering to identify the five novel clusters (C1-C5). Survival analysis of the resulting clusters revealed a significant difference in the overall survival of clusters (p-value: 0.019). Each cluster was then molecularly characterized to identify specific molecular characteristics. We found that cluster C3 showed minimal genetic aberration with a high prognosis. Next, classification models were developed using data from each omic level to predict the subgroup of unseen patients. Decision‑level fused classification models were then built using these classifiers, which were used to classify unseen patients into five novel clusters. We also showed that the multi-omics-based classification model outperformed single-omic-based models, and the combination of classifiers proved to be a more accurate prediction model than the individual classifiers. In summary, we have used ML models to develop a classification method and identified five novel NSCLC clusters with different genetic and clinical characteristics.

摘要

非小细胞肺癌(NSCLC)是一种预后不良的异质性疾病。在癌症中识别新的亚型有助于将具有相似分子和临床表型的患者进行分类。本工作提出了一种 NSCLC 亚组识别的端到端流水线。在这里,我们使用基于机器学习(ML)的方法将 NSCLC 的多组学数据压缩到更低的维度空间。对该数据进行一致 K-means 聚类,以识别五个新的簇(C1-C5)。所得簇的生存分析显示簇的总生存率存在显著差异(p 值:0.019)。然后对每个簇进行分子特征分析,以确定特定的分子特征。我们发现簇 C3 表现出最小的遗传异常和较高的预后。接下来,使用来自每个组学水平的数据开发分类模型,以预测未见过的患者的亚组。然后使用这些分类器构建决策级融合分类模型,用于将未见过的患者分类到五个新的簇中。我们还表明,基于多组学的分类模型优于基于单组学的模型,并且分类器的组合被证明是比单个分类器更准确的预测模型。总之,我们使用 ML 模型开发了一种分类方法,并确定了五个具有不同遗传和临床特征的 NSCLC 新簇。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce5c/10030850/b852c415c01f/41598_2023_31426_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验