Suppr超能文献

多个癌症预后数据集的稀疏组惩罚整合分析

Sparse group penalized integrative analysis of multiple cancer prognosis datasets.

作者信息

Liu Jin, Huang Jian, Xie Yang, Ma Shuangge

机构信息

Division of Epidemiology and Biostatistics, UIC School of Public Health 1603 W Taylor Street, MC 923, Chicago, IL 60612-4394, USA.

出版信息

Genet Res (Camb). 2013 Jun;95(2-3):68-77. doi: 10.1017/S0016672313000086.

Abstract

In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Owing to the 'large d, small n' characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyses multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the accelerated failure time model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group minimax concave penalty approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach.

摘要

在癌症研究中,已经广泛开展了高通量分析研究,以寻找与预后相关的标志物。由于“大d,小n”的特性,对单个数据集进行分析所产生的结果可能并不理想。最近的研究表明,同时分析多个数据集的整合分析可能比单数据集分析和经典荟萃分析更有效。在现有的大多数整合分析中,都假定了同质性模型,该模型假定不同的数据集共享同一组标志物。已经设计了几种方法来强化这一假设。实际上,不同的数据集在患者选择标准、分析技术以及许多其他方面可能存在差异。这些差异可能会使同质性模型过于受限。在本研究中,我们假定了异质性模型,在该模型下,不同的数据集可以有不同的标志物集。对于多个癌症预后数据集,我们采用加速失效时间模型来描述生存情况。在流行的半参数生存模型中,该模型的计算成本可能最低。对于标志物选择,我们采用稀疏组极小极大凹惩罚方法。该方法具有直观的公式,并且可以使用有效的组坐标下降算法进行计算。模拟研究表明,在同质性和异质性模型下,它都优于现有方法。数据分析进一步证明了异质性模型和所提出方法的优点。

相似文献

2
Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization.采用复合惩罚的癌症诊断研究综合分析
Scand Stat Theory Appl. 2014 Mar 1;41(1):87-103. doi: 10.1111/j.1467-9469.2012.00816.x.
10

本文引用的文献

2
Semiparametric Regression Pursuit.半参数回归追踪
Stat Sin. 2012 Oct 1;22(4):1403-1426. doi: 10.5705/ss.2010.298.
6
Identification of cancer genomic markers via integrative sparse boosting.通过集成稀疏提升识别癌症基因组标记物。
Biostatistics. 2012 Jul;13(3):509-22. doi: 10.1093/biostatistics/kxr033. Epub 2011 Oct 31.
8

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验