• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过稳健变量选择对多组学数据进行Meta分析

Meta-Analyzing Multiple Omics Data With Robust Variable Selection.

作者信息

Hu Zongliang, Zhou Yan, Tong Tiejun

机构信息

College of Mathematics and Statistics, Shenzhen University, Shenzhen, China.

Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong.

出版信息

Front Genet. 2021 Jul 5;12:656826. doi: 10.3389/fgene.2021.656826. eCollection 2021.

DOI:10.3389/fgene.2021.656826
PMID:34290735
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8288516/
Abstract

High-throughput omics data are becoming more and more popular in various areas of science. Given that many publicly available datasets address the same questions, researchers have applied meta-analysis to synthesize multiple datasets to achieve more reliable results for model estimation and prediction. Due to the high dimensionality of omics data, it is also desirable to incorporate variable selection into meta-analysis. Existing meta-analyzing variable selection methods are often sensitive to the presence of outliers, and may lead to missed detections of relevant covariates, especially for lasso-type penalties. In this paper, we develop a robust variable selection algorithm for meta-analyzing high-dimensional datasets based on logistic regression. We first search an outlier-free subset from each dataset by borrowing information across the datasets with repeatedly use of the least trimmed squared estimates for the logistic model and together with a hierarchical bi-level variable selection technique. We then refine a reweighting step to further improve the efficiency after obtaining a reliable non-outlier subset. Simulation studies and real data analysis show that our new method can provide more reliable results than the existing meta-analysis methods in the presence of outliers.

摘要

高通量组学数据在各个科学领域越来越受欢迎。鉴于许多公开可用的数据集都针对相同的问题,研究人员已应用荟萃分析来综合多个数据集,以获得更可靠的模型估计和预测结果。由于组学数据的高维度性,将变量选择纳入荟萃分析也是可取的。现有的荟萃分析变量选择方法通常对异常值的存在很敏感,并且可能导致错过相关协变量的检测,特别是对于套索型惩罚。在本文中,我们基于逻辑回归开发了一种用于荟萃分析高维数据集的稳健变量选择算法。我们首先通过反复使用逻辑模型的最小修剪平方估计并结合分层双水平变量选择技术,跨数据集借用信息,从每个数据集中搜索一个无异常值的子集。然后,在获得可靠的无异常值子集后,我们改进一个重新加权步骤以进一步提高效率。模拟研究和实际数据分析表明,在存在异常值的情况下,我们的新方法比现有的荟萃分析方法能提供更可靠的结果。

相似文献

1
Meta-Analyzing Multiple Omics Data With Robust Variable Selection.通过稳健变量选择对多组学数据进行Meta分析
Front Genet. 2021 Jul 5;12:656826. doi: 10.3389/fgene.2021.656826. eCollection 2021.
2
Comparison of methods for the detection of outliers and associated biomarkers in mislabeled omics data.比较用于检测组学数据中标记错误的异常值和相关生物标志物的方法。
BMC Bioinformatics. 2020 Aug 14;21(1):357. doi: 10.1186/s12859-020-03653-9.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
Outlier detection and robust variable selection via the penalized weighted LAD-LASSO method.通过惩罚加权最小绝对偏差-套索方法进行异常值检测和稳健变量选择
J Appl Stat. 2020 Feb 4;48(2):234-246. doi: 10.1080/02664763.2020.1722079. eCollection 2021.
5
Meta-analysis based variable selection for gene expression data.基于荟萃分析的基因表达数据变量选择
Biometrics. 2014 Dec;70(4):872-80. doi: 10.1111/biom.12213. Epub 2014 Sep 5.
6
Two-step approach for assessing the health effects of environmental chemical mixtures: application to simulated datasets and real data from the Navajo Birth Cohort Study.两步法评估环境化学混合物的健康效应:在模拟数据集和纳瓦霍出生队列研究的真实数据中的应用。
Environ Health. 2019 May 9;18(1):46. doi: 10.1186/s12940-019-0482-6.
7
An Efficient Algorithm for the Detection of Outliers in Mislabeled Omics Data.一种用于检测组学数据中错误标记异常值的高效算法。
Comput Math Methods Med. 2021 Dec 22;2021:9436582. doi: 10.1155/2021/9436582. eCollection 2021.
8
TSPLASSO: A Two-stage Prior LASSO Algorithm for Gene Selection using Omics Data.TSPLASSO:一种使用组学数据进行基因选择的两阶段先验LASSO算法。
IEEE J Biomed Health Inform. 2023 Oct 23;PP. doi: 10.1109/JBHI.2023.3326485.
9
Ensemble outlier detection and gene selection in triple-negative breast cancer data.三阴性乳腺癌数据中的集成异常值检测和基因选择。
BMC Bioinformatics. 2018 May 4;19(1):168. doi: 10.1186/s12859-018-2149-7.
10
Public sector reforms and their impact on the level of corruption: A systematic review.公共部门改革及其对腐败程度的影响:一项系统综述。
Campbell Syst Rev. 2021 May 24;17(2):e1173. doi: 10.1002/cl2.1173. eCollection 2021 Jun.

引用本文的文献

1
Springer: An R package for bi-level variable selection of high-dimensional longitudinal data.施普林格:用于高维纵向数据双层变量选择的R包。
Front Genet. 2023 Apr 6;14:1088223. doi: 10.3389/fgene.2023.1088223. eCollection 2023.

本文引用的文献

1
Adaptive Huber Regression.自适应稳健回归
J Am Stat Assoc. 2020;115(529):254-265. doi: 10.1080/01621459.2018.1543124. Epub 2019 Apr 22.
2
Modeling Between-Study Heterogeneity for Improved Replicability in Gene Signature Selection and Clinical Prediction.为提高基因特征选择和临床预测中的可重复性对研究间异质性进行建模
J Am Stat Assoc. 2020;115(531):1125-1138. doi: 10.1080/01621459.2019.1671197. Epub 2019 Oct 29.
3
Meta-Analysis Based on Nonconvex Regularization.基于非凸正则化的荟萃分析。
Sci Rep. 2020 Apr 1;10(1):5755. doi: 10.1038/s41598-020-62473-2.
4
Robust network-based regularization and variable selection for high-dimensional genomic data in cancer prognosis.用于癌症预后高维基因组数据的基于网络的稳健正则化和变量选择
Genet Epidemiol. 2019 Apr;43(3):276-291. doi: 10.1002/gepi.22194. Epub 2019 Feb 11.
5
Individual Participant Data Meta-Analysis Explained.个体参与者数据荟萃分析解读
J Pediatr. 2019 Apr;207:265-266. doi: 10.1016/j.jpeds.2018.12.046. Epub 2019 Feb 2.
6
A Selective Review of Multi-Level Omics Data Integration Using Variable Selection.使用变量选择对多组学数据整合进行的选择性综述
High Throughput. 2019 Jan 18;8(1):4. doi: 10.3390/ht8010004.
7
Fused Lasso Approach in Regression Coefficients Clustering - Learning Parameter Heterogeneity in Data Integration.回归系数聚类中的融合套索方法——数据整合中的学习参数异质性
J Mach Learn Res. 2016;17.
8
Meta-analytic support vector machine for integrating multiple omics data.用于整合多组学数据的元分析支持向量机
BioData Min. 2017 Jan 26;10:2. doi: 10.1186/s13040-017-0126-8. eCollection 2017.
9
Sparse meta-analysis with high-dimensional data.高维数据的稀疏荟萃分析。
Biostatistics. 2016 Apr;17(2):205-20. doi: 10.1093/biostatistics/kxv038. Epub 2015 Sep 21.
10
Integrative Analysis of "-Omics" Data Using Penalty Functions.使用惩罚函数对“组学”数据进行综合分析。
Wiley Interdiscip Rev Comput Stat. 2015 Jan-Feb;7(1):99-108. doi: 10.1002/wics.1322.