通过集成稀疏提升识别癌症基因组标记物。

Identification of cancer genomic markers via integrative sparse boosting.

机构信息

Department of Statistics, Penn State University, 301 Thomas Building, State College, PA 16801, USA.

出版信息

Biostatistics. 2012 Jul;13(3):509-22. doi: 10.1093/biostatistics/kxr033. Epub 2011 Oct 31.

DOI:10.1093/biostatistics/kxr033

PMID:22045909

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3577103/

Abstract

In high-throughput cancer genomic studies, markers identified from the analysis of single data sets often suffer a lack of reproducibility because of the small sample sizes. An ideal solution is to conduct large-scale prospective studies, which are extremely expensive and time consuming. A cost-effective remedy is to pool data from multiple comparable studies and conduct integrative analysis. Integrative analysis of multiple data sets is challenging because of the high dimensionality of genomic measurements and heterogeneity among studies. In this article, we propose a sparse boosting approach for marker identification in integrative analysis of multiple heterogeneous cancer diagnosis studies with gene expression measurements. The proposed approach can effectively accommodate the heterogeneity among multiple studies and identify markers with consistent effects across studies. Simulation shows that the proposed approach has satisfactory identification results and outperforms alternatives including an intensity approach and meta-analysis. The proposed approach is used to identify markers of pancreatic cancer and liver cancer.

摘要

在高通量癌症基因组研究中，由于样本量小，从单一数据集分析中识别出的标记往往缺乏可重复性。理想的解决方案是进行大规模的前瞻性研究，但这非常昂贵且耗时。一种经济有效的补救方法是汇集来自多个可比研究的数据并进行综合分析。由于基因组测量的高维度和研究之间的异质性，对多个数据集进行综合分析具有挑战性。在本文中，我们提出了一种稀疏提升方法，用于对具有基因表达测量的多个异质癌症诊断研究的综合分析中的标记进行识别。所提出的方法可以有效地适应多个研究之间的异质性，并识别出在多个研究中具有一致效果的标记。模拟表明，所提出的方法具有令人满意的识别结果，优于包括强度方法和荟萃分析在内的替代方法。所提出的方法用于识别胰腺癌和肝癌的标志物。

相似文献

Identification of cancer genomic markers via integrative sparse boosting.通过集成稀疏提升识别癌症基因组标记物。

Biostatistics. 2012 Jul;13(3):509-22. doi: 10.1093/biostatistics/kxr033. Epub 2011 Oct 31.

Integrative analysis and variable selection with multiple high-dimensional data sets.综合分析和多高维数据集的变量选择。

Biostatistics. 2011 Oct;12(4):763-75. doi: 10.1093/biostatistics/kxr004. Epub 2011 Mar 16.

Integrative prescreening in analysis of multiple cancer genomic studies.综合筛选在多个癌症基因组研究分析中的应用。

BMC Bioinformatics. 2012 Jul 16;13:168. doi: 10.1186/1471-2105-13-168.

Identification of breast cancer prognosis markers using integrative sparse boosting.使用整合稀疏提升法鉴定乳腺癌预后标志物。

Methods Inf Med. 2012;51(2):152-61. doi: 10.3414/ME11-02-0019. Epub 2012 Feb 20.

Integrative analysis of multiple cancer genomic datasets under the heterogeneity model.基于异质性模型的多种癌症基因组数据集的综合分析。

Stat Med. 2013 Sep 10;32(20):3509-21. doi: 10.1002/sim.5780. Epub 2013 Mar 21.

Stat Med. 2017 Feb 10;36(3):509-559. doi: 10.1002/sim.7138. Epub 2016 Sep 25.

Identification of Breast Cancer Prognosis Markers via Integrative Analysis.通过综合分析鉴定乳腺癌预后标志物

Comput Stat Data Anal. 2012 Sep 1;56(9):2718-2728. doi: 10.1016/j.csda.2012.02.017.

Gene network-based cancer prognosis analysis with sparse boosting.基于基因网络的稀疏增强癌症预后分析

Genet Res (Camb). 2012 Aug;94(4):205-21. doi: 10.1017/S0016672312000419.

Integrative analysis of multiple cancer prognosis studies with gene expression measurements.基于基因表达测量的多种癌症预后研究的综合分析。

Stat Med. 2011 Dec 10;30(28):3361-71. doi: 10.1002/sim.4337. Epub 2011 Aug 25.

Regularized gene selection in cancer microarray meta-analysis.癌症微阵列荟萃分析中的正则化基因选择

BMC Bioinformatics. 2009 Jan 1;10:1. doi: 10.1186/1471-2105-10-1.

引用本文的文献

Integrative analysis of high-dimensional quantile regression with contrasted penalization.具有对比惩罚的高维分位数回归的综合分析

J Appl Stat. 2024 Dec 10;52(9):1760-1776. doi: 10.1080/02664763.2024.2438799. eCollection 2025.

High-dimensional integrative copula discriminant analysis for multiomics data.用于多组学数据的高维整合Copula判别分析

Stat Med. 2020 Dec 30;39(30):4869-4884. doi: 10.1002/sim.8758. Epub 2020 Oct 15.

Integrative sparse partial least squares.综合稀疏偏最小二乘法。

Stat Med. 2021 Apr;40(9):2239-2256. doi: 10.1002/sim.8900. Epub 2021 Feb 8.

An integrative sparse boosting analysis of cancer genomic commonality and difference.癌症基因组共性与差异的整合稀疏增强分析

Stat Methods Med Res. 2020 May;29(5):1325-1337. doi: 10.1177/0962280219859026. Epub 2019 Jul 7.

Network-based logistic regression integration method for biomarker identification.用于生物标志物识别的基于网络的逻辑回归集成方法。

BMC Syst Biol. 2018 Dec 31;12(Suppl 9):135. doi: 10.1186/s12918-018-0657-8.

Integrative sparse principal component analysis of gene expression data.基因表达数据的整合稀疏主成分分析

Genet Epidemiol. 2017 Dec;41(8):844-865. doi: 10.1002/gepi.22089. Epub 2017 Nov 8.

Stat Med. 2017 Feb 10;36(3):509-559. doi: 10.1002/sim.7138. Epub 2016 Sep 25.

Integrative Analysis of Cancer Diagnosis Studies with Composite Penalization.采用复合惩罚的癌症诊断研究综合分析

Scand Stat Theory Appl. 2014 Mar 1;41(1):87-103. doi: 10.1111/j.1467-9469.2012.00816.x.

Integrative analysis of high-throughput cancer studies with contrasted penalization.高通量癌症研究的综合分析与对比惩罚。

Genet Epidemiol. 2014 Feb;38(2):144-51. doi: 10.1002/gepi.21781. Epub 2014 Jan 6.

Sparse group penalized integrative analysis of multiple cancer prognosis datasets.多个癌症预后数据集的稀疏组惩罚整合分析

Genet Res (Camb). 2013 Jun;95(2-3):68-77. doi: 10.1017/S0016672313000086.

本文引用的文献

Regularized gene selection in cancer microarray meta-analysis.癌症微阵列荟萃分析中的正则化基因选择

BMC Bioinformatics. 2009 Jan 1;10:1. doi: 10.1186/1471-2105-10-1.

Meta-analysis combines affymetrix microarray results across laboratories.荟萃分析整合了不同实验室的Affymetrix微阵列结果。

Comp Funct Genomics. 2005;6(3):116-22. doi: 10.1002/cfg.460.

Merging two gene-expression studies via cross-platform normalization.通过跨平台标准化合并两项基因表达研究。

Bioinformatics. 2008 May 1;24(9):1154-60. doi: 10.1093/bioinformatics/btn083. Epub 2008 Mar 5.

A latent variable approach for meta-analysis of gene expression data from multiple microarray experiments.一种用于对来自多个微阵列实验的基因表达数据进行荟萃分析的潜在变量方法。

BMC Bioinformatics. 2007 Sep 27;8:364. doi: 10.1186/1471-2105-8-364.

Patterns of somatic mutation in human cancer genomes.人类癌症基因组中的体细胞突变模式。

Nature. 2007 Mar 8;446(7132):153-8. doi: 10.1038/nature05610.

Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data.通过微阵列数据的两阶段混合建模开发的乳腺癌预后元特征。

BMC Genomics. 2004 Dec 14;5(1):94. doi: 10.1186/1471-2164-5-94.

BagBoosting for tumor classification with gene expression data.用于基于基因表达数据的肿瘤分类的BagBoosting算法

Bioinformatics. 2004 Dec 12;20(18):3583-93. doi: 10.1093/bioinformatics/bth447. Epub 2004 Oct 5.

Bioinformatics strategies for translating genome-wide expression analyses into clinically useful cancer markers.将全基因组表达分析转化为临床可用癌症标志物的生物信息学策略。

Ann N Y Acad Sci. 2004 May;1020:32-40. doi: 10.1196/annals.1310.005.

Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression.癌症微阵列数据的大规模荟萃分析确定了肿瘤转化和进展的常见转录谱。

Proc Natl Acad Sci U S A. 2004 Jun 22;101(25):9309-14. doi: 10.1073/pnas.0401994101. Epub 2004 Jun 7.

Integrative analysis of multiple gene expression profiles applied to liver cancer study.应用于肝癌研究的多基因表达谱综合分析。

FEBS Lett. 2004 May 7;565(1-3):93-100. doi: 10.1016/j.febslet.2004.03.081.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验