Suppr超能文献

通过稳健变量选择对多组学数据进行Meta分析

Meta-Analyzing Multiple Omics Data With Robust Variable Selection.

作者信息

Hu Zongliang, Zhou Yan, Tong Tiejun

机构信息

College of Mathematics and Statistics, Shenzhen University, Shenzhen, China.

Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong.

出版信息

Front Genet. 2021 Jul 5;12:656826. doi: 10.3389/fgene.2021.656826. eCollection 2021.

Abstract

High-throughput omics data are becoming more and more popular in various areas of science. Given that many publicly available datasets address the same questions, researchers have applied meta-analysis to synthesize multiple datasets to achieve more reliable results for model estimation and prediction. Due to the high dimensionality of omics data, it is also desirable to incorporate variable selection into meta-analysis. Existing meta-analyzing variable selection methods are often sensitive to the presence of outliers, and may lead to missed detections of relevant covariates, especially for lasso-type penalties. In this paper, we develop a robust variable selection algorithm for meta-analyzing high-dimensional datasets based on logistic regression. We first search an outlier-free subset from each dataset by borrowing information across the datasets with repeatedly use of the least trimmed squared estimates for the logistic model and together with a hierarchical bi-level variable selection technique. We then refine a reweighting step to further improve the efficiency after obtaining a reliable non-outlier subset. Simulation studies and real data analysis show that our new method can provide more reliable results than the existing meta-analysis methods in the presence of outliers.

摘要

高通量组学数据在各个科学领域越来越受欢迎。鉴于许多公开可用的数据集都针对相同的问题,研究人员已应用荟萃分析来综合多个数据集,以获得更可靠的模型估计和预测结果。由于组学数据的高维度性,将变量选择纳入荟萃分析也是可取的。现有的荟萃分析变量选择方法通常对异常值的存在很敏感,并且可能导致错过相关协变量的检测,特别是对于套索型惩罚。在本文中,我们基于逻辑回归开发了一种用于荟萃分析高维数据集的稳健变量选择算法。我们首先通过反复使用逻辑模型的最小修剪平方估计并结合分层双水平变量选择技术,跨数据集借用信息,从每个数据集中搜索一个无异常值的子集。然后,在获得可靠的无异常值子集后,我们改进一个重新加权步骤以进一步提高效率。模拟研究和实际数据分析表明,在存在异常值的情况下,我们的新方法比现有的荟萃分析方法能提供更可靠的结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验