Suppr
超能文献

基于荟萃分析的基因表达数据变量选择

Meta-analysis based variable selection for gene expression data.

作者信息

Li Quefeng, Wang Sijian, Huang Chiang-Ching, Yu Menggang, Shao Jun

机构信息

Department of Statistics, University of Wisconsin, Madison, Wisconsin, U.S.A.

出版信息

Biometrics. 2014 Dec;70(4):872-80. doi: 10.1111/biom.12213. Epub 2014 Sep 5.

DOI:10.1111/biom.12213

PMID:25196635

Abstract

Recent advance in biotechnology and its wide applications have led to the generation of many high-dimensional gene expression data sets that can be used to address similar biological questions. Meta-analysis plays an important role in summarizing and synthesizing scientific evidence from multiple studies. When the dimensions of datasets are high, it is desirable to incorporate variable selection into meta-analysis to improve model interpretation and prediction. According to our knowledge, all existing methods conduct variable selection with meta-analyzed data in an "all-in-or-all-out" fashion, that is, a gene is either selected in all of studies or not selected in any study. However, due to data heterogeneity commonly exist in meta-analyzed data, including choices of biospecimens, study population, and measurement sensitivity, it is possible that a gene is important in some studies while unimportant in others. In this article, we propose a novel method called meta-lasso for variable selection with high-dimensional meta-analyzed data. Through a hierarchical decomposition on regression coefficients, our method not only borrows strength across multiple data sets to boost the power to identify important genes, but also keeps the selection flexibility among data sets to take into account data heterogeneity. We show that our method possesses the gene selection consistency, that is, when sample size of each data set is large, with high probability, our method can identify all important genes and remove all unimportant genes. Simulation studies demonstrate a good performance of our method. We applied our meta-lasso method to a meta-analysis of five cardiovascular studies. The analysis results are clinically meaningful.

摘要

生物技术的最新进展及其广泛应用催生了许多高维基因表达数据集，可用于解决类似的生物学问题。荟萃分析在总结和综合多项研究的科学证据方面发挥着重要作用。当数据集维度较高时，希望在荟萃分析中纳入变量选择，以改善模型解释和预测。据我们所知，所有现有方法都以“全选或全不选”的方式对荟萃分析数据进行变量选择，即一个基因要么在所有研究中都被选中，要么在任何研究中都不被选中。然而，由于荟萃分析数据中普遍存在数据异质性，包括生物样本的选择、研究人群和测量灵敏度，一个基因在某些研究中可能很重要，而在其他研究中可能不重要。在本文中，我们提出了一种名为meta-lasso的新方法，用于对高维荟萃分析数据进行变量选择。通过对回归系数进行分层分解，我们的方法不仅在多个数据集之间借用优势以提高识别重要基因的能力，还保持了数据集之间的选择灵活性，以考虑数据异质性。我们表明我们的方法具有基因选择一致性，即当每个数据集的样本量很大时，我们的方法很有可能识别出所有重要基因并剔除所有不重要基因。模拟研究证明了我们方法的良好性能。我们将meta-lasso方法应用于五项心血管研究的荟萃分析。分析结果具有临床意义。

相似文献

Meta-analysis based variable selection for gene expression data.

Biometrics. 2014 Dec;70(4):872-80. doi: 10.1111/biom.12213. Epub 2014 Sep 5.

Sparse meta-analysis with high-dimensional data.

Biostatistics. 2016 Apr;17(2):205-20. doi: 10.1093/biostatistics/kxv038. Epub 2015 Sep 21.

Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.

Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.

Variable selection for multiply-imputed data with application to dioxin exposure study.

Stat Med. 2013 Sep 20;32(21):3646-59. doi: 10.1002/sim.5783. Epub 2013 Mar 25.

What should be expected from feature selection in small-sample settings.

Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26.

Meta-Analysis Based on Nonconvex Regularization.

Sci Rep. 2020 Apr 1;10(1):5755. doi: 10.1038/s41598-020-62473-2.

High-dimensional variable selection in meta-analysis for censored data.

Biometrics. 2011 Jun;67(2):504-12. doi: 10.1111/j.1541-0420.2010.01466.x. Epub 2010 Aug 5.

Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data.

Bioinformatics. 2005 Jul 1;21(13):3001-8. doi: 10.1093/bioinformatics/bti422. Epub 2005 Apr 6.

The ant colony algorithm for feature selection in high-dimension gene expression data for disease classification.

Math Med Biol. 2007 Dec;24(4):413-26. doi: 10.1093/imammb/dqn001. Epub 2008 Feb 22.

A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments.

Bioinformatics. 2008 Feb 1;24(3):374-82. doi: 10.1093/bioinformatics/btm620. Epub 2008 Jan 18.

引用本文的文献

Extensions of Heterogeneity in Integration and Prediction (HIP) With R Shiny Application.

Stat Med. 2025 Apr;44(8-9):e70036. doi: 10.1002/sim.70036.

Protocol: Machine learning for selecting moderators in meta-analysis: A systematic review of methods and their applications, and an evaluation using data on tutoring interventions.

Campbell Syst Rev. 2024 Dec 10;20(4):e70009. doi: 10.1002/cl2.70009. eCollection 2024 Dec.

HIP: a method for high-dimensional multi-view data integration and prediction accounting for subgroup heterogeneity.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae470.

Multi-task Learning with High-Dimensional Noisy Images.

J Am Stat Assoc. 2024;119(545):650-663. doi: 10.1080/01621459.2022.2140052. Epub 2022 Nov 17.

Integrative Learning of Structured High-Dimensional Data from Multiple Datasets.

Stat Anal Data Min. 2023 Apr;16(2):120-134. doi: 10.1002/sam.11601. Epub 2022 Nov 8.

A novel meta-analysis based on data augmentation and elastic data shared lasso regularization for gene expression.

BMC Bioinformatics. 2022 Aug 23;23(Suppl 10):353. doi: 10.1186/s12859-022-04887-5.

Meta-Analyzing Multiple Omics Data With Robust Variable Selection.

Front Genet. 2021 Jul 5;12:656826. doi: 10.3389/fgene.2021.656826. eCollection 2021.

Simultaneous Covariance Inference for Multimodal Integrative Analysis.

J Am Stat Assoc. 2020;115(531):1279-1291. doi: 10.1080/01621459.2019.1623040. Epub 2019 Jun 28.

Modeling Between-Study Heterogeneity for Improved Replicability in Gene Signature Selection and Clinical Prediction.

J Am Stat Assoc. 2020;115(531):1125-1138. doi: 10.1080/01621459.2019.1671197. Epub 2019 Oct 29.

GRIA: Graphical Regularization for Integrative Analysis.

Proc SIAM Int Conf Data Min. 2020;2020:604-612. doi: 10.1137/1.9781611976236.68.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

基于荟萃分析的基因表达数据变量选择

Meta-analysis based variable selection for gene expression data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译