Suppr超能文献

在癌症遗传数据的综合分析中促进模型稀疏结构的相似性。

Promoting similarity of model sparsity structures in integrative analysis of cancer genetic data.

作者信息

Huang Yuan, Liu Jin, Yi Huangdi, Shia Ben-Chang, Ma Shuangge

机构信息

VA Cooperative Studies Program Coordinating Center, West Haven, CT; Department of Biostatistics, Yale University, New Haven, CT, U.S.A.

Center of Quantitative Medicine, Duke-NUS Medical School, Singapore.

出版信息

Stat Med. 2017 Feb 10;36(3):509-559. doi: 10.1002/sim.7138. Epub 2016 Sep 25.

Abstract

In profiling studies, the analysis of a single dataset often leads to unsatisfactory results because of the small sample size. Multi-dataset analysis utilizes information of multiple independent datasets and outperforms single-dataset analysis. Among the available multi-dataset analysis methods, integrative analysis methods aggregate and analyze raw data and outperform meta-analysis methods, which analyze multiple datasets separately and then pool summary statistics. In this study, we conduct integrative analysis and marker selection under the heterogeneity structure, which allows different datasets to have overlapping but not necessarily identical sets of markers. Under certain scenarios, it is reasonable to expect some similarity of identified marker sets - or equivalently, similarity of model sparsity structures - across multiple datasets. However, the existing methods do not have a mechanism to explicitly promote such similarity. To tackle this problem, we develop a sparse boosting method. This method uses a BIC/HDBIC criterion to select weak learners in boosting and encourages sparsity. A new penalty is introduced to promote the similarity of model sparsity structures across datasets. The proposed method has a intuitive formulation and is broadly applicable and computationally affordable. In numerical studies, we analyze right censored survival data under the accelerated failure time model. Simulation shows that the proposed method outperforms alternative boosting and penalization methods with more accurate marker identification. The analysis of three breast cancer prognosis datasets shows that the proposed method can identify marker sets with increased similarity across datasets and improved prediction performance. Copyright © 2016 John Wiley & Sons, Ltd.

摘要

在剖析研究中,由于样本量较小,对单个数据集进行分析往往会得到不尽人意的结果。多数据集分析利用多个独立数据集的信息,其性能优于单数据集分析。在现有的多数据集分析方法中,整合分析方法对原始数据进行汇总和分析,其性能优于元分析方法,元分析方法是分别分析多个数据集,然后汇总统计摘要。在本研究中,我们在异质性结构下进行整合分析和标记选择,这允许不同的数据集具有重叠但不一定相同的标记集。在某些情况下,合理的预期是多个数据集中识别出的标记集具有一定的相似性——或者等效地,模型稀疏结构具有相似性。然而,现有方法没有明确促进这种相似性的机制。为了解决这个问题,我们开发了一种稀疏提升方法。该方法在提升过程中使用BIC/HDBIC准则来选择弱学习器,并鼓励稀疏性。引入了一种新的惩罚项来促进不同数据集之间模型稀疏结构的相似性。所提出的方法具有直观的公式,广泛适用且计算成本低。在数值研究中,我们在加速失效时间模型下分析右删失生存数据。模拟表明,所提出的方法在标记识别方面比其他提升和惩罚方法更准确,性能更优。对三个乳腺癌预后数据集的分析表明,所提出的方法能够识别出不同数据集之间相似性增加且预测性能提高的标记集。版权所有© 2016约翰威立父子有限公司。

相似文献

1
Promoting similarity of model sparsity structures in integrative analysis of cancer genetic data.
Stat Med. 2017 Feb 10;36(3):509-559. doi: 10.1002/sim.7138. Epub 2016 Sep 25.
2
Promoting Similarity of Sparsity Structures in Integrative Analysis with Penalization.
J Am Stat Assoc. 2017;112(517):342-350. doi: 10.1080/01621459.2016.1139497. Epub 2017 May 3.
3
Integrative analysis of prognosis data on multiple cancer subtypes.
Biometrics. 2014 Sep;70(3):480-8. doi: 10.1111/biom.12177. Epub 2014 Apr 25.
4
Identification of breast cancer prognosis markers using integrative sparse boosting.
Methods Inf Med. 2012;51(2):152-61. doi: 10.3414/ME11-02-0019. Epub 2012 Feb 20.
5
Integrative analysis of high-throughput cancer studies with contrasted penalization.
Genet Epidemiol. 2014 Feb;38(2):144-51. doi: 10.1002/gepi.21781. Epub 2014 Jan 6.
6
Integrative sparse principal component analysis of gene expression data.
Genet Epidemiol. 2017 Dec;41(8):844-865. doi: 10.1002/gepi.22089. Epub 2017 Nov 8.
7
Integrative analysis of multiple cancer genomic datasets under the heterogeneity model.
Stat Med. 2013 Sep 10;32(20):3509-21. doi: 10.1002/sim.5780. Epub 2013 Mar 21.
8
Sparse group penalized integrative analysis of multiple cancer prognosis datasets.
Genet Res (Camb). 2013 Jun;95(2-3):68-77. doi: 10.1017/S0016672313000086.
9
Identification of cancer genomic markers via integrative sparse boosting.
Biostatistics. 2012 Jul;13(3):509-22. doi: 10.1093/biostatistics/kxr033. Epub 2011 Oct 31.
10
Gene network-based cancer prognosis analysis with sparse boosting.
Genet Res (Camb). 2012 Aug;94(4):205-21. doi: 10.1017/S0016672312000419.

引用本文的文献

1
Integrative Interaction Analysis using Threshold Gradient Directed Regularization.
Appl Stoch Models Bus Ind. 2019 Mar-Apr;35(2):354-375. doi: 10.1002/asmb.2342. Epub 2018 May 29.
2
Robust semiparametric gene-environment interaction analysis using sparse boosting.
Stat Med. 2019 Oct 15;38(23):4625-4641. doi: 10.1002/sim.8322. Epub 2019 Jul 29.
3
An integrative sparse boosting analysis of cancer genomic commonality and difference.
Stat Methods Med Res. 2020 May;29(5):1325-1337. doi: 10.1177/0962280219859026. Epub 2019 Jul 7.
4
Penalized integrative semiparametric interaction analysis for multiple genetic datasets.
Stat Med. 2019 Jul 30;38(17):3221-3242. doi: 10.1002/sim.8172. Epub 2019 Apr 16.
5
Identification of cancer omics commonality and difference via community fusion.
Stat Med. 2019 Mar 30;38(7):1200-1212. doi: 10.1002/sim.8027. Epub 2018 Nov 12.
6
Overlapping clustering of gene expression data using penalized weighted normalized cut.
Genet Epidemiol. 2018 Dec;42(8):796-811. doi: 10.1002/gepi.22164. Epub 2018 Oct 9.
7
An Update on Statistical Boosting in Biomedicine.
Comput Math Methods Med. 2017;2017:6083072. doi: 10.1155/2017/6083072. Epub 2017 Aug 2.

本文引用的文献

1
Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization.
Scand Stat Theory Appl. 2014 Mar 1;41(1):87-103. doi: 10.1111/j.1467-9469.2012.00816.x.
2
Sparse group penalized integrative analysis of multiple cancer prognosis datasets.
Genet Res (Camb). 2013 Jun;95(2-3):68-77. doi: 10.1017/S0016672313000086.
3
Gene network-based cancer prognosis analysis with sparse boosting.
Genet Res (Camb). 2012 Aug;94(4):205-21. doi: 10.1017/S0016672312000419.
4
Comprehensive literature review and statistical considerations for microarray meta-analysis.
Nucleic Acids Res. 2012 May;40(9):3785-99. doi: 10.1093/nar/gkr1265. Epub 2012 Jan 19.
5
MetaQC: objective quality control and inclusion/exclusion criteria for genomic meta-analysis.
Nucleic Acids Res. 2012 Jan;40(2):e15. doi: 10.1093/nar/gkr1071. Epub 2011 Nov 23.
6
Integrative analysis of multiple cancer prognosis studies with gene expression measurements.
Stat Med. 2011 Dec 10;30(28):3361-71. doi: 10.1002/sim.4337. Epub 2011 Aug 25.
7
Identification of cancer genomic markers via integrative sparse boosting.
Biostatistics. 2012 Jul;13(3):509-22. doi: 10.1093/biostatistics/kxr033. Epub 2011 Oct 31.
8
Assessing the dependence of sensitivity and specificity on prevalence in meta-analysis.
Biostatistics. 2011 Oct;12(4):710-22. doi: 10.1093/biostatistics/kxr008. Epub 2011 Apr 27.
10
Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression.
Proc Natl Acad Sci U S A. 2004 Jun 22;101(25):9309-14. doi: 10.1073/pnas.0401994101. Epub 2004 Jun 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验