Suppr超能文献

基于多项逻辑回归模型并应用于TCGA转录组数据的重叠组筛选过程的多类别生存结果分类

Multicategory Survival Outcomes Classification via Overlapping Group Screening Process Based on Multinomial Logistic Regression Model With Application to TCGA Transcriptomic Data.

作者信息

Wang Jie-Huei, Hou Po-Lin, Chen Yi-Hau

机构信息

Department of Mathematics, National Chung Cheng University, Chiayi City, Taiwan.

Institute of Statistical Science, Academia Sinica, Taipei, Taiwan.

出版信息

Cancer Inform. 2024 Oct 8;23:11769351241286710. doi: 10.1177/11769351241286710. eCollection 2024.

Abstract

OBJECTIVES

Under the classification of multicategory survival outcomes of cancer patients, it is crucial to identify biomarkers that affect specific outcome categories. The classification of multicategory survival outcomes from transcriptomic data has been thoroughly investigated in computational biology. Nevertheless, several challenges must be addressed, including the ultra-high-dimensional feature space, feature contamination, and data imbalance, all of which contribute to the instability of the diagnostic model. Furthermore, although most methods achieve accurate predicted performance for binary classification with high-dimensional transcriptomic data, their extension to multi-class classification is not straightforward.

METHODS

We employ the One-versus-One strategy to transform multi-class classification into multiple binary classification, and utilize the overlapping group screening procedure with binary logistic regression to include pathway information for identifying important genes and gene-gene interactions for multicategory survival outcomes.

RESULTS

A series of simulation studies are conducted to compare the classification accuracy of our proposed approach with some existing machine learning methods. In practical data applications, we utilize the random oversampling procedure to tackle class imbalance issues. We then apply the proposed method to analyze transcriptomic data from various cancers in The Cancer Genome Atlas, such as kidney renal papillary cell carcinoma, lung adenocarcinoma, and head and neck squamous cell carcinoma. Our aim is to establish an accurate microarray-based multicategory cancer diagnosis model. The numerical results illustrate that the new proposal effectively enhances cancer diagnosis compared to approaches that neglect pathway information.

CONCLUSIONS

We showcase the effectiveness of the proposed method in terms of class prediction accuracy through evaluations on simulated synthetic datasets as well as real dataset applications. We also identified the cancer-related gene-gene interaction biomarkers and reported the corresponding network structure. According to the identified major genes and gene-gene interactions, we can predict for each patient the probabilities that he/she belongs to each of the survival outcome classes.

摘要

目的

在癌症患者多类别生存结局的分类中,识别影响特定结局类别的生物标志物至关重要。计算生物学已对转录组数据的多类别生存结局分类进行了深入研究。然而,仍有几个挑战需要解决,包括超高维特征空间、特征污染和数据不平衡,所有这些都会导致诊断模型的不稳定性。此外,尽管大多数方法在高维转录组数据的二元分类中能实现准确的预测性能,但将其扩展到多类别分类并非易事。

方法

我们采用一对一策略将多类别分类转化为多个二元分类,并利用二元逻辑回归的重叠组筛选程序纳入通路信息,以识别多类别生存结局的重要基因和基因-基因相互作用。

结果

进行了一系列模拟研究,将我们提出的方法与一些现有的机器学习方法的分类准确性进行比较。在实际数据应用中,我们利用随机过采样程序来解决类别不平衡问题。然后,我们将所提出的方法应用于分析癌症基因组图谱中各种癌症的转录组数据,如肾肾乳头状细胞癌、肺腺癌和头颈部鳞状细胞癌。我们的目标是建立一个基于微阵列的准确的多类别癌症诊断模型。数值结果表明,与忽略通路信息的方法相比,新提议有效地提高了癌症诊断能力。

结论

通过对模拟合成数据集以及实际数据集应用的评估,我们展示了所提出方法在类别预测准确性方面的有效性。我们还识别了癌症相关的基因-基因相互作用生物标志物,并报告了相应的网络结构。根据识别出的主要基因和基因-基因相互作用,我们可以为每个患者预测他/她属于每个生存结局类别的概率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9626/11462568/792172e94d7f/10.1177_11769351241286710-fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验