在癌症遗传数据的综合分析中促进模型稀疏结构的相似性。

Promoting similarity of model sparsity structures in integrative analysis of cancer genetic data.

作者信息

Huang Yuan, Liu Jin, Yi Huangdi, Shia Ben-Chang, Ma Shuangge

机构信息

VA Cooperative Studies Program Coordinating Center, West Haven, CT; Department of Biostatistics, Yale University, New Haven, CT, U.S.A.

Center of Quantitative Medicine, Duke-NUS Medical School, Singapore.

出版信息

Stat Med. 2017 Feb 10;36(3):509-559. doi: 10.1002/sim.7138. Epub 2016 Sep 25.

DOI:10.1002/sim.7138

PMID:27667129

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5209260/

Abstract

In profiling studies, the analysis of a single dataset often leads to unsatisfactory results because of the small sample size. Multi-dataset analysis utilizes information of multiple independent datasets and outperforms single-dataset analysis. Among the available multi-dataset analysis methods, integrative analysis methods aggregate and analyze raw data and outperform meta-analysis methods, which analyze multiple datasets separately and then pool summary statistics. In this study, we conduct integrative analysis and marker selection under the heterogeneity structure, which allows different datasets to have overlapping but not necessarily identical sets of markers. Under certain scenarios, it is reasonable to expect some similarity of identified marker sets - or equivalently, similarity of model sparsity structures - across multiple datasets. However, the existing methods do not have a mechanism to explicitly promote such similarity. To tackle this problem, we develop a sparse boosting method. This method uses a BIC/HDBIC criterion to select weak learners in boosting and encourages sparsity. A new penalty is introduced to promote the similarity of model sparsity structures across datasets. The proposed method has a intuitive formulation and is broadly applicable and computationally affordable. In numerical studies, we analyze right censored survival data under the accelerated failure time model. Simulation shows that the proposed method outperforms alternative boosting and penalization methods with more accurate marker identification. The analysis of three breast cancer prognosis datasets shows that the proposed method can identify marker sets with increased similarity across datasets and improved prediction performance. Copyright © 2016 John Wiley & Sons, Ltd.

摘要

在剖析研究中，由于样本量较小，对单个数据集进行分析往往会得到不尽人意的结果。多数据集分析利用多个独立数据集的信息，其性能优于单数据集分析。在现有的多数据集分析方法中，整合分析方法对原始数据进行汇总和分析，其性能优于元分析方法，元分析方法是分别分析多个数据集，然后汇总统计摘要。在本研究中，我们在异质性结构下进行整合分析和标记选择，这允许不同的数据集具有重叠但不一定相同的标记集。在某些情况下，合理的预期是多个数据集中识别出的标记集具有一定的相似性——或者等效地，模型稀疏结构具有相似性。然而，现有方法没有明确促进这种相似性的机制。为了解决这个问题，我们开发了一种稀疏提升方法。该方法在提升过程中使用BIC/HDBIC准则来选择弱学习器，并鼓励稀疏性。引入了一种新的惩罚项来促进不同数据集之间模型稀疏结构的相似性。所提出的方法具有直观的公式，广泛适用且计算成本低。在数值研究中，我们在加速失效时间模型下分析右删失生存数据。模拟表明，所提出的方法在标记识别方面比其他提升和惩罚方法更准确，性能更优。对三个乳腺癌预后数据集的分析表明，所提出的方法能够识别出不同数据集之间相似性增加且预测性能提高的标记集。版权所有© 2016约翰威立父子有限公司。

相似文献

Stat Med. 2017 Feb 10;36(3):509-559. doi: 10.1002/sim.7138. Epub 2016 Sep 25.

J Am Stat Assoc. 2017;112(517):342-350. doi: 10.1080/01621459.2016.1139497. Epub 2017 May 3.

Integrative analysis of prognosis data on multiple cancer subtypes.多种癌症亚型预后数据的综合分析。

Biometrics. 2014 Sep;70(3):480-8. doi: 10.1111/biom.12177. Epub 2014 Apr 25.

Identification of breast cancer prognosis markers using integrative sparse boosting.使用整合稀疏提升法鉴定乳腺癌预后标志物。

Methods Inf Med. 2012;51(2):152-61. doi: 10.3414/ME11-02-0019. Epub 2012 Feb 20.

Integrative analysis of high-throughput cancer studies with contrasted penalization.高通量癌症研究的综合分析与对比惩罚。

Genet Epidemiol. 2014 Feb;38(2):144-51. doi: 10.1002/gepi.21781. Epub 2014 Jan 6.

Integrative sparse principal component analysis of gene expression data.基因表达数据的整合稀疏主成分分析

Genet Epidemiol. 2017 Dec;41(8):844-865. doi: 10.1002/gepi.22089. Epub 2017 Nov 8.

Integrative analysis of multiple cancer genomic datasets under the heterogeneity model.基于异质性模型的多种癌症基因组数据集的综合分析。

Stat Med. 2013 Sep 10;32(20):3509-21. doi: 10.1002/sim.5780. Epub 2013 Mar 21.

Sparse group penalized integrative analysis of multiple cancer prognosis datasets.多个癌症预后数据集的稀疏组惩罚整合分析

Genet Res (Camb). 2013 Jun;95(2-3):68-77. doi: 10.1017/S0016672313000086.

Identification of cancer genomic markers via integrative sparse boosting.通过集成稀疏提升识别癌症基因组标记物。

Biostatistics. 2012 Jul;13(3):509-22. doi: 10.1093/biostatistics/kxr033. Epub 2011 Oct 31.

Gene network-based cancer prognosis analysis with sparse boosting.基于基因网络的稀疏增强癌症预后分析

Genet Res (Camb). 2012 Aug;94(4):205-21. doi: 10.1017/S0016672312000419.

引用本文的文献

Integrative Interaction Analysis using Threshold Gradient Directed Regularization.使用阈值梯度定向正则化的综合交互分析

Appl Stoch Models Bus Ind. 2019 Mar-Apr;35(2):354-375. doi: 10.1002/asmb.2342. Epub 2018 May 29.

Robust semiparametric gene-environment interaction analysis using sparse boosting.使用稀疏提升进行稳健的半参数基因-环境交互作用分析。

Stat Med. 2019 Oct 15;38(23):4625-4641. doi: 10.1002/sim.8322. Epub 2019 Jul 29.

An integrative sparse boosting analysis of cancer genomic commonality and difference.癌症基因组共性与差异的整合稀疏增强分析

Stat Methods Med Res. 2020 May;29(5):1325-1337. doi: 10.1177/0962280219859026. Epub 2019 Jul 7.

Penalized integrative semiparametric interaction analysis for multiple genetic datasets.用于多个遗传数据集的惩罚积分半参数交互分析。

Stat Med. 2019 Jul 30;38(17):3221-3242. doi: 10.1002/sim.8172. Epub 2019 Apr 16.

Identification of cancer omics commonality and difference via community fusion.通过社区融合来识别癌症组学的共性和差异。

Stat Med. 2019 Mar 30;38(7):1200-1212. doi: 10.1002/sim.8027. Epub 2018 Nov 12.

Overlapping clustering of gene expression data using penalized weighted normalized cut.使用惩罚加权归一化割算法对基因表达数据进行重叠聚类

Genet Epidemiol. 2018 Dec;42(8):796-811. doi: 10.1002/gepi.22164. Epub 2018 Oct 9.

An Update on Statistical Boosting in Biomedicine.生物医学中统计增强技术的最新进展

Comput Math Methods Med. 2017;2017:6083072. doi: 10.1155/2017/6083072. Epub 2017 Aug 2.

本文引用的文献

Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization.采用复合惩罚的癌症诊断研究综合分析

Scand Stat Theory Appl. 2014 Mar 1;41(1):87-103. doi: 10.1111/j.1467-9469.2012.00816.x.

Sparse group penalized integrative analysis of multiple cancer prognosis datasets.多个癌症预后数据集的稀疏组惩罚整合分析

Genet Res (Camb). 2013 Jun;95(2-3):68-77. doi: 10.1017/S0016672313000086.

Gene network-based cancer prognosis analysis with sparse boosting.基于基因网络的稀疏增强癌症预后分析

Genet Res (Camb). 2012 Aug;94(4):205-21. doi: 10.1017/S0016672312000419.

Comprehensive literature review and statistical considerations for microarray meta-analysis.综合文献回顾和微阵列荟萃分析的统计考虑。

Nucleic Acids Res. 2012 May;40(9):3785-99. doi: 10.1093/nar/gkr1265. Epub 2012 Jan 19.

MetaQC: objective quality control and inclusion/exclusion criteria for genomic meta-analysis.MetaQC：基因组荟萃分析的客观质量控制和纳入/排除标准。

Nucleic Acids Res. 2012 Jan;40(2):e15. doi: 10.1093/nar/gkr1071. Epub 2011 Nov 23.

Integrative analysis of multiple cancer prognosis studies with gene expression measurements.基于基因表达测量的多种癌症预后研究的综合分析。

Stat Med. 2011 Dec 10;30(28):3361-71. doi: 10.1002/sim.4337. Epub 2011 Aug 25.

Identification of cancer genomic markers via integrative sparse boosting.通过集成稀疏提升识别癌症基因组标记物。

Biostatistics. 2012 Jul;13(3):509-22. doi: 10.1093/biostatistics/kxr033. Epub 2011 Oct 31.

Assessing the dependence of sensitivity and specificity on prevalence in meta-analysis.评估荟萃分析中敏感度和特异度对患病率的依赖性。

Biostatistics. 2011 Oct;12(4):710-22. doi: 10.1093/biostatistics/kxr008. Epub 2011 Apr 27.

Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis.乳腺癌中的基因表达谱分析：理解组织学分级的分子基础以改善预后。

J Natl Cancer Inst. 2006 Feb 15;98(4):262-72. doi: 10.1093/jnci/djj052.

Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression.癌症微阵列数据的大规模荟萃分析确定了肿瘤转化和进展的常见转录谱。

Proc Natl Acad Sci U S A. 2004 Jun 22;101(25):9309-14. doi: 10.1073/pnas.0401994101. Epub 2004 Jun 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验