基于阴性对照去除不必要变异的统一和通用方法

UNIFYING AND GENERALIZING METHODS FOR REMOVING UNWANTED VARIATION BASED ON NEGATIVE CONTROLS.

作者信息

Gerard David, Stephens Matthew

机构信息

Department of Mathematics and Statistics, American University, Washington, DC 20016, USA.

Departments of Human Genetics and Statistics, University of Chicago, Chicago, IL 60637, USA.

出版信息

Stat Sin. 2021 Jul;31(3):1145-1166. doi: 10.5705/ss.202018.0345.

DOI:10.5705/ss.202018.0345

PMID:38148787

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10751021/

Abstract

Unwanted variation, including hidden confounding, is a well-known problem in many fields, but particularly in large-scale gene expression studies. Recent proposals to use control genes, genes assumed to be unassociated with the covariates of interest, have led to new methods to deal with this problem. Several versions of these removing unwanted variation (RUV) methods have been proposed, including RUV1, RUV2, RUV4, RUVinv, RUVrinv, and RUVfun. Here, we introduce a general framework, RUV*, that both unites and generalizes these approaches. This unifying framework helps clarify the connections between existing methods. In particular, we provide conditions under which RUV2 and RUV4 are equivalent. The RUV* framework preserves an advantage of the RUV approaches, namely, their modularity, which facilitates the development of novel methods based on existing matrix imputation algorithms. We illustrate this by implementing RUVB, a version of RUV* based on Bayesian factor analysis. In realistic simulations based on real data, we found RUVB to be competitive with existing methods in terms of both power and calibration. However, providing a consistently reliable calibration among the data sets remains challenging.

摘要

不必要的变异，包括隐藏的混杂因素，在许多领域都是一个众所周知的问题，尤其是在大规模基因表达研究中。最近提出的使用对照基因（即假定与感兴趣的协变量不相关的基因）的建议，催生了处理这一问题的新方法。已经提出了这些去除不必要变异（RUV）方法的几个版本，包括RUV1、RUV2、RUV4、RUVinv、RUVrinv和RUVfun。在此，我们引入了一个通用框架RUV*，它统一并概括了这些方法。这个统一框架有助于阐明现有方法之间的联系。特别是，我们给出了RUV2和RUV4等效的条件。RUV框架保留了RUV方法的一个优点，即其模块化，这有利于基于现有矩阵插补算法开发新方法。我们通过实现RUVB（一种基于贝叶斯因子分析的RUV版本）来说明这一点。在基于真实数据的实际模拟中，我们发现RUVB在功效和校准方面与现有方法具有竞争力。然而，在各数据集中提供始终可靠的校准仍然具有挑战性。

相似文献

UNIFYING AND GENERALIZING METHODS FOR REMOVING UNWANTED VARIATION BASED ON NEGATIVE CONTROLS.基于阴性对照去除不必要变异的统一和通用方法

Stat Sin. 2021 Jul;31(3):1145-1166. doi: 10.5705/ss.202018.0345.

RUV-III-NB: normalization of single cell RNA-seq data.RUV-III-NB：单细胞 RNA-seq 数据的标准化。

Nucleic Acids Res. 2022 Sep 9;50(16):e96. doi: 10.1093/nar/gkac486.

Using control genes to correct for unwanted variation in microarray data.利用对照基因纠正微阵列数据中的非期望变异。

Biostatistics. 2012 Jul;13(3):539-52. doi: 10.1093/biostatistics/kxr034. Epub 2011 Nov 17.

A robust removing unwanted variation-testing procedure via -divergence.一种通过散度进行稳健的去除不必要变异测试的程序。

Biometrics. 2019 Jun;75(2):650-662. doi: 10.1111/biom.13002. Epub 2019 Aug 20.

Removing unwanted variation from large-scale RNA sequencing data with PRPS.使用 PRPS 去除大规模 RNA 测序数据中的非期望变异。

Nat Biotechnol. 2023 Jan;41(1):82-95. doi: 10.1038/s41587-022-01440-w. Epub 2022 Sep 15.

consensusDE: an R package for assessing consensus of multiple RNA-seq algorithms with RUV correction.consensusDE：一个用于通过RUV校正评估多种RNA测序算法一致性的R包。

PeerJ. 2019 Dec 13;7:e8206. doi: 10.7717/peerj.8206. eCollection 2019.

Blind estimation and correction of microarray batch effect.盲估计和校正微阵列批次效应。

PLoS One. 2020 Apr 9;15(4):e0231446. doi: 10.1371/journal.pone.0231446. eCollection 2020.

Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation.经验贝叶斯收缩和错误发现率估计，允许出现不需要的变化。

Biostatistics. 2020 Jan 1;21(1):15-32. doi: 10.1093/biostatistics/kxy029.

Assessing and removing the effect of unwanted technical variations in microbiome data.评估和去除微生物组数据中不必要的技术变异的影响。

Sci Rep. 2022 Dec 23;12(1):22236. doi: 10.1038/s41598-022-26141-x.

CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING.多重假设检验中的混杂因素调整

Ann Stat. 2017 Oct;45(5):1863-1894. doi: 10.1214/16-AOS1511. Epub 2017 Oct 31.

引用本文的文献

Hierarchical confounder discovery in the experiment-machine learning cycle.实验-机器学习循环中的分层混杂因素发现

Patterns (N Y). 2022 Feb 22;3(4):100451. doi: 10.1016/j.patter.2022.100451. eCollection 2022 Apr 8.

Data-based RNA-seq simulations by binomial thinning.基于二项式稀疏化的基于数据的 RNA-seq 模拟。

BMC Bioinformatics. 2020 May 24;21(1):206. doi: 10.1186/s12859-020-3450-9.

Adjusting for Principal Components of Molecular Phenotypes Induces Replicating False Positives.调整分子表型的主成分会导致复制的假阳性。

Genetics. 2019 Apr;211(4):1179-1189. doi: 10.1534/genetics.118.301768. Epub 2019 Jan 28.

Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation.经验贝叶斯收缩和错误发现率估计，允许出现不需要的变化。

Biostatistics. 2020 Jan 1;21(1):15-32. doi: 10.1093/biostatistics/kxy029.

本文引用的文献

Estimating and accounting for unobserved covariates in high-dimensional correlated data.估计和考虑高维相关数据中未观测到的协变量。

J Am Stat Assoc. 2022;117(537):225-236. doi: 10.1080/01621459.2020.1769635. Epub 2020 Jun 30.

Accounting for unobserved covariates with varying degrees of estimability in high-dimensional biological data.在高维生物数据中考虑具有不同可估计程度的未观测协变量。

Biometrika. 2019 Dec;106(4):823-840. doi: 10.1093/biomet/asz037. Epub 2019 Sep 16.

CONFOUNDER ADJUSTMENT IN MULTIPLE HYPOTHESIS TESTING.多重假设检验中的混杂因素调整

Ann Stat. 2017 Oct;45(5):1863-1894. doi: 10.1214/16-AOS1511. Epub 2017 Oct 31.

Empirical Bayes shrinkage and false discovery rate estimation, allowing for unwanted variation.经验贝叶斯收缩和错误发现率估计，允许出现不需要的变化。

Biostatistics. 2020 Jan 1;21(1):15-32. doi: 10.1093/biostatistics/kxy029.

An improved and explicit surrogate variable analysis procedure by coefficient adjustment.一种通过系数调整改进的显式替代变量分析程序。

Biometrika. 2017 Jun;104(2):303-316. doi: 10.1093/biomet/asx018. Epub 2017 Apr 21.

Controlling for Confounding Effects in Single Cell RNA Sequencing Studies Using both Control and Target Genes.使用对照和靶标基因控制单细胞 RNA 测序研究中的混杂效应。

Sci Rep. 2017 Oct 19;7(1):13587. doi: 10.1038/s41598-017-13665-w.

False discovery rates: a new deal.错误发现率：一项新举措。

Biostatistics. 2017 Apr 1;18(2):275-294. doi: 10.1093/biostatistics/kxw041.

TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION.具有缺失数据插补应用的可转置正则化协方差模型。

Ann Appl Stat. 2010 Jun;4(2):764-790. doi: 10.1214/09-AOAS314.

A reanalysis of mouse ENCODE comparative gene expression data.小鼠ENCODE比较基因表达数据的重新分析。

F1000Res. 2015 May 19;4:121. doi: 10.12688/f1000research.6536.1. eCollection 2015.

Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans.人类基因组学。基因型-组织表达（GTEx）试点分析：人类多组织基因调控

Science. 2015 May 8;348(6235):648-60. doi: 10.1126/science.1262110. Epub 2015 May 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于阴性对照去除不必要变异的统一和通用方法

UNIFYING AND GENERALIZING METHODS FOR REMOVING UNWANTED VARIATION BASED ON NEGATIVE CONTROLS.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献