Suppr超能文献

基于荟萃分析的基因表达数据变量选择

Meta-analysis based variable selection for gene expression data.

作者信息

Li Quefeng, Wang Sijian, Huang Chiang-Ching, Yu Menggang, Shao Jun

机构信息

Department of Statistics, University of Wisconsin, Madison, Wisconsin, U.S.A.

出版信息

Biometrics. 2014 Dec;70(4):872-80. doi: 10.1111/biom.12213. Epub 2014 Sep 5.

Abstract

Recent advance in biotechnology and its wide applications have led to the generation of many high-dimensional gene expression data sets that can be used to address similar biological questions. Meta-analysis plays an important role in summarizing and synthesizing scientific evidence from multiple studies. When the dimensions of datasets are high, it is desirable to incorporate variable selection into meta-analysis to improve model interpretation and prediction. According to our knowledge, all existing methods conduct variable selection with meta-analyzed data in an "all-in-or-all-out" fashion, that is, a gene is either selected in all of studies or not selected in any study. However, due to data heterogeneity commonly exist in meta-analyzed data, including choices of biospecimens, study population, and measurement sensitivity, it is possible that a gene is important in some studies while unimportant in others. In this article, we propose a novel method called meta-lasso for variable selection with high-dimensional meta-analyzed data. Through a hierarchical decomposition on regression coefficients, our method not only borrows strength across multiple data sets to boost the power to identify important genes, but also keeps the selection flexibility among data sets to take into account data heterogeneity. We show that our method possesses the gene selection consistency, that is, when sample size of each data set is large, with high probability, our method can identify all important genes and remove all unimportant genes. Simulation studies demonstrate a good performance of our method. We applied our meta-lasso method to a meta-analysis of five cardiovascular studies. The analysis results are clinically meaningful.

摘要

生物技术的最新进展及其广泛应用催生了许多高维基因表达数据集,可用于解决类似的生物学问题。荟萃分析在总结和综合多项研究的科学证据方面发挥着重要作用。当数据集维度较高时,希望在荟萃分析中纳入变量选择,以改善模型解释和预测。据我们所知,所有现有方法都以“全选或全不选”的方式对荟萃分析数据进行变量选择,即一个基因要么在所有研究中都被选中,要么在任何研究中都不被选中。然而,由于荟萃分析数据中普遍存在数据异质性,包括生物样本的选择、研究人群和测量灵敏度,一个基因在某些研究中可能很重要,而在其他研究中可能不重要。在本文中,我们提出了一种名为meta-lasso的新方法,用于对高维荟萃分析数据进行变量选择。通过对回归系数进行分层分解,我们的方法不仅在多个数据集之间借用优势以提高识别重要基因的能力,还保持了数据集之间的选择灵活性,以考虑数据异质性。我们表明我们的方法具有基因选择一致性,即当每个数据集的样本量很大时,我们的方法很有可能识别出所有重要基因并剔除所有不重要基因。模拟研究证明了我们方法的良好性能。我们将meta-lasso方法应用于五项心血管研究的荟萃分析。分析结果具有临床意义。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验