Suppr超能文献

在癌症遗传数据的综合分析中促进模型稀疏结构的相似性。

Promoting similarity of model sparsity structures in integrative analysis of cancer genetic data.

作者信息

Huang Yuan, Liu Jin, Yi Huangdi, Shia Ben-Chang, Ma Shuangge

机构信息

VA Cooperative Studies Program Coordinating Center, West Haven, CT; Department of Biostatistics, Yale University, New Haven, CT, U.S.A.

Center of Quantitative Medicine, Duke-NUS Medical School, Singapore.

出版信息

Stat Med. 2017 Feb 10;36(3):509-559. doi: 10.1002/sim.7138. Epub 2016 Sep 25.

Abstract

In profiling studies, the analysis of a single dataset often leads to unsatisfactory results because of the small sample size. Multi-dataset analysis utilizes information of multiple independent datasets and outperforms single-dataset analysis. Among the available multi-dataset analysis methods, integrative analysis methods aggregate and analyze raw data and outperform meta-analysis methods, which analyze multiple datasets separately and then pool summary statistics. In this study, we conduct integrative analysis and marker selection under the heterogeneity structure, which allows different datasets to have overlapping but not necessarily identical sets of markers. Under certain scenarios, it is reasonable to expect some similarity of identified marker sets - or equivalently, similarity of model sparsity structures - across multiple datasets. However, the existing methods do not have a mechanism to explicitly promote such similarity. To tackle this problem, we develop a sparse boosting method. This method uses a BIC/HDBIC criterion to select weak learners in boosting and encourages sparsity. A new penalty is introduced to promote the similarity of model sparsity structures across datasets. The proposed method has a intuitive formulation and is broadly applicable and computationally affordable. In numerical studies, we analyze right censored survival data under the accelerated failure time model. Simulation shows that the proposed method outperforms alternative boosting and penalization methods with more accurate marker identification. The analysis of three breast cancer prognosis datasets shows that the proposed method can identify marker sets with increased similarity across datasets and improved prediction performance. Copyright © 2016 John Wiley & Sons, Ltd.

摘要

在剖析研究中,由于样本量较小,对单个数据集进行分析往往会得到不尽人意的结果。多数据集分析利用多个独立数据集的信息,其性能优于单数据集分析。在现有的多数据集分析方法中,整合分析方法对原始数据进行汇总和分析,其性能优于元分析方法,元分析方法是分别分析多个数据集,然后汇总统计摘要。在本研究中,我们在异质性结构下进行整合分析和标记选择,这允许不同的数据集具有重叠但不一定相同的标记集。在某些情况下,合理的预期是多个数据集中识别出的标记集具有一定的相似性——或者等效地,模型稀疏结构具有相似性。然而,现有方法没有明确促进这种相似性的机制。为了解决这个问题,我们开发了一种稀疏提升方法。该方法在提升过程中使用BIC/HDBIC准则来选择弱学习器,并鼓励稀疏性。引入了一种新的惩罚项来促进不同数据集之间模型稀疏结构的相似性。所提出的方法具有直观的公式,广泛适用且计算成本低。在数值研究中,我们在加速失效时间模型下分析右删失生存数据。模拟表明,所提出的方法在标记识别方面比其他提升和惩罚方法更准确,性能更优。对三个乳腺癌预后数据集的分析表明,所提出的方法能够识别出不同数据集之间相似性增加且预测性能提高的标记集。版权所有© 2016约翰威立父子有限公司。

相似文献

3
6
Integrative sparse principal component analysis of gene expression data.基因表达数据的整合稀疏主成分分析
Genet Epidemiol. 2017 Dec;41(8):844-865. doi: 10.1002/gepi.22089. Epub 2017 Nov 8.
9
Identification of cancer genomic markers via integrative sparse boosting.通过集成稀疏提升识别癌症基因组标记物。
Biostatistics. 2012 Jul;13(3):509-22. doi: 10.1093/biostatistics/kxr033. Epub 2011 Oct 31.

引用本文的文献

1
Integrative Interaction Analysis using Threshold Gradient Directed Regularization.使用阈值梯度定向正则化的综合交互分析
Appl Stoch Models Bus Ind. 2019 Mar-Apr;35(2):354-375. doi: 10.1002/asmb.2342. Epub 2018 May 29.
3
An integrative sparse boosting analysis of cancer genomic commonality and difference.癌症基因组共性与差异的整合稀疏增强分析
Stat Methods Med Res. 2020 May;29(5):1325-1337. doi: 10.1177/0962280219859026. Epub 2019 Jul 7.
7
An Update on Statistical Boosting in Biomedicine.生物医学中统计增强技术的最新进展
Comput Math Methods Med. 2017;2017:6083072. doi: 10.1155/2017/6083072. Epub 2017 Aug 2.

本文引用的文献

1
Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization.采用复合惩罚的癌症诊断研究综合分析
Scand Stat Theory Appl. 2014 Mar 1;41(1):87-103. doi: 10.1111/j.1467-9469.2012.00816.x.
7
Identification of cancer genomic markers via integrative sparse boosting.通过集成稀疏提升识别癌症基因组标记物。
Biostatistics. 2012 Jul;13(3):509-22. doi: 10.1093/biostatistics/kxr033. Epub 2011 Oct 31.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验