基于普通最小二乘法（OLS）和线性混合模型的表型对基因表达影响推断的蒙特卡罗模拟

Department of Biological Sciences, University of Southern Maine , Portland , ME , United States.

PeerJ. 2016 Oct 11;4:e2575. doi: 10.7717/peerj.2575. eCollection 2016.

BACKGROUND

Self-contained tests estimate and test the association between a phenotype and mean expression level in a gene set defined . Many self-contained gene set analysis methods have been developed but the performance of these methods for phenotypes that are continuous rather than discrete and with multiple nuisance covariates has not been well studied. Here, I use Monte Carlo simulation to evaluate the performance of both novel and previously published (and readily available via R) methods for inferring effects of a continuous predictor on mean expression in the presence of nuisance covariates. The motivating data are a high-profile dataset which was used to show opposing effects of hedonic and eudaimonic well-being (or happiness) on the mean expression level of a set of genes that has been correlated with social adversity (the CTRA gene set). The original analysis of these data used a linear model (GLS) of fixed effects with correlated error to infer effects of and on mean CTRA expression.

METHODS

The standardized effects of and on CTRA gene set expression estimated by GLS were compared to estimates using multivariate (OLS) linear models and generalized estimating equation (GEE) models. The OLS estimates were tested using O'Brien's OLS test, Anderson's permutation [Formula: see text]-test, two permutation -tests (including GlobalAncova), and a rotation -test (Roast). The GEE estimates were tested using a Wald test with robust standard errors. The performance (Type I, II, S, and M errors) of all tests was investigated using a Monte Carlo simulation of data explicitly modeled on the re-analyzed dataset.

RESULTS

GLS estimates are inconsistent between data sets, and, in each dataset, at least one coefficient is large and highly statistically significant. By contrast, effects estimated by OLS or GEE are very small, especially relative to the standard errors. Bootstrap and permutation GLS distributions suggest that the GLS results in downward biased standard errors and inflated coefficients. The Monte Carlo simulation of error rates shows highly inflated Type I error from the GLS test and slightly inflated Type I error from the GEE test. By contrast, Type I error for all OLS tests are at the nominal level. The permutation -tests have ∼1.9X the power of the other OLS tests. This increased power comes at a cost of high sign error (∼10%) if tested on small effects.

DISCUSSION

The apparently replicated pattern of well-being effects on gene expression is most parsimoniously explained as "correlated noise" due to the geometry of multiple regression. The GLS for fixed effects with correlated error, or any linear mixed model for estimating fixed effects in designs with many repeated measures or outcomes, should be used cautiously because of the inflated Type I and M error. By contrast, all OLS tests perform well, and the permutation -tests have superior performance, including moderate power for very small effects.

背景

自包含检验估计并检验一个表型与定义的基因集中平均表达水平之间的关联。已经开发了许多自包含基因集分析方法，但对于连续而非离散且具有多个干扰协变量的表型，这些方法的性能尚未得到充分研究。在这里，我使用蒙特卡罗模拟来评估新颖的和先前发表的（可通过R轻松获取）方法在存在干扰协变量的情况下推断连续预测变量对平均表达的影响的性能。激励数据是一个备受瞩目的数据集，该数据集用于显示享乐主义和幸福主义幸福感（或快乐）对一组与社会逆境相关的基因（CTRA基因集）平均表达水平的相反影响。对这些数据的原始分析使用了具有相关误差的固定效应线性模型（GLS）来推断[具体内容缺失]和[具体内容缺失]对CTRA平均表达的影响。

方法

将通过GLS估计的[具体内容缺失]和[具体内容缺失]对CTRA基因集表达的标准化效应与使用多元（OLS）线性模型和广义估计方程（GEE）模型的估计值进行比较。使用奥布赖恩OLS检验、安德森置换[公式：见原文]检验、两个置换[具体内容缺失]检验（包括全局协方差分析）和一个旋转[具体内容缺失]检验（Roast）对OLS估计值进行检验。使用具有稳健标准误差的Wald检验对GEE估计值进行检验。使用明确基于重新分析数据集建模的数据的蒙特卡罗模拟研究所有检验的性能（I型、II型、S型和M型错误）。

结果

GLS估计在不同数据集之间不一致，并且在每个数据集中，至少有一个系数很大且具有高度统计学显著性。相比之下，OLS或GEE估计的效应非常小，特别是相对于标准误差而言。自助法和置换GLS分布表明，GLS导致标准误差向下偏倚且系数膨胀。错误率的蒙特卡罗模拟显示，GLS检验的I型错误率极高，GEE检验的I型错误率略有膨胀。相比之下，所有OLS检验的I型错误率都在名义水平。置换[具体内容缺失]检验的功效是其他OLS检验的约1.9倍。如果对小效应进行检验，这种增加的功效是以高符号错误（约10%）为代价的。

讨论

幸福感对基因表达的明显重复模式最简洁的解释是由于多元回归的几何结构导致的“相关噪声”。由于I型和M型错误膨胀，对于具有相关误差的固定效应GLS或用于估计具有许多重复测量或结果的设计中的固定效应的任何线性混合模型，应谨慎使用。相比之下，所有OLS检验表现良好，置换[具体内容缺失]检验具有卓越的性能，包括对非常小的效应具有中等功效。

相似文献

Monte Carlo simulation of OLS and linear mixed model inference of phenotypic effects on gene expression.

PeerJ. 2016 Oct 11;4:e2575. doi: 10.7717/peerj.2575. eCollection 2016.

Practical usage of O'Brien's OLS and GLS statistics in clinical trials.

Pharm Stat. 2008 Jan-Mar;7(1):53-68. doi: 10.1002/pst.268.

Performance of model-based vs. permutation tests in the HEALing (Helping to End Addiction Long-term) Communities Study, a covariate-constrained cluster randomized trial.

Trials. 2022 Sep 8;23(1):762. doi: 10.1186/s13063-022-06708-9.

Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.

Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.

Inference With Difference-in-Differences With a Small Number of Groups: A Review, Simulation Study, and Empirical Application Using SHARE Data.

Med Care. 2018 Jan;56(1):97-105. doi: 10.1097/MLR.0000000000000830.

Applications of Monte Carlo Simulation in Modelling of Biochemical Processes

Permutation tests for hypothesis testing with animal social network data: Problems and potential solutions.

Methods Ecol Evol. 2022 Jan;13(1):144-156. doi: 10.1111/2041-210X.13741. Epub 2021 Oct 28.

Psychological well-being and gene expression in Korean adults: The role of age.

Psychoneuroendocrinology. 2020 Oct;120:104785. doi: 10.1016/j.psyneuen.2020.104785. Epub 2020 Jun 24.

A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment.

J Ment Health Policy Econ. 2002 Mar;5(1):21-31.

Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes.

Stat Med. 2015 Jan 30;34(2):281-96. doi: 10.1002/sim.6344. Epub 2014 Oct 24.

引用本文的文献

Implications of Debunking the "Critical Positivity Ratio" for Humanistic Psychology: Introduction to Special Issue.

J Humanist Psychol. 2018 May;58(3):239-261. doi: 10.1177/0022167818762227. Epub 2018 Mar 29.

本文引用的文献

More Questions than Answers: Continued Critical Reanalysis of Fredrickson et al.'s Studies of Genomics and Well-Being.

PLoS One. 2016 Jun 7;11(6):e0156415. doi: 10.1371/journal.pone.0156415. eCollection 2016.

Correction: Psychological Well-Being and the Human Conserved Transcriptional Response to Adversity.

PLoS One. 2016 Jun 3;11(6):e0157116. doi: 10.1371/journal.pone.0157116. eCollection 2016.

Loneliness, eudaimonia, and the human conserved transcriptional response to adversity.

Psychoneuroendocrinology. 2015 Dec;62:11-7. doi: 10.1016/j.psyneuen.2015.07.001. Epub 2015 Jul 8.

Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.

Perspect Psychol Sci. 2014 Nov;9(6):641-51. doi: 10.1177/1745691614551642.

Psychological well-being and the human conserved transcriptional response to adversity.

PLoS One. 2015 Mar 26;10(3):e0121839. doi: 10.1371/journal.pone.0121839. eCollection 2015.

limma powers differential expression analyses for RNA-sequencing and microarray studies.

Nucleic Acids Res. 2015 Apr 20;43(7):e47. doi: 10.1093/nar/gkv007. Epub 2015 Jan 20.

A critical reanalysis of the relationship between genomics and well-being.

Proc Natl Acad Sci U S A. 2014 Sep 2;111(35):12705-9. doi: 10.1073/pnas.1407057111. Epub 2014 Aug 25.

The effect of unmeasured confounders on the ability to estimate a true performance or selection gradient (and other partial regression coefficients).

Evolution. 2014 Jul;68(7):2128-36. doi: 10.1111/evo.12406. Epub 2014 Apr 16.

A functional genomic perspective on human well-being.

Proc Natl Acad Sci U S A. 2013 Aug 13;110(33):13684-9. doi: 10.1073/pnas.1305419110. Epub 2013 Jul 29.

Empirical pathway analysis, without permutation.

Biostatistics. 2013 Jul;14(3):573-85. doi: 10.1093/biostatistics/kxt004. Epub 2013 Feb 20.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Monte Carlo simulation of OLS and linear mixed model inference of phenotypic effects on gene expression.

PeerJ. 2016 Oct 11;4:e2575. doi: 10.7717/peerj.2575. eCollection 2016.

Practical usage of O'Brien's OLS and GLS statistics in clinical trials.

Pharm Stat. 2008 Jan-Mar;7(1):53-68. doi: 10.1002/pst.268.

Performance of model-based vs. permutation tests in the HEALing (Helping to End Addiction Long-term) Communities Study, a covariate-constrained cluster randomized trial.

Trials. 2022 Sep 8;23(1):762. doi: 10.1186/s13063-022-06708-9.

Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Constituents.

Res Rep Health Eff Inst. 2015 Jun(183 Pt 1-2):5-50.

Inference With Difference-in-Differences With a Small Number of Groups: A Review, Simulation Study, and Empirical Application Using SHARE Data.

Med Care. 2018 Jan;56(1):97-105. doi: 10.1097/MLR.0000000000000830.

Applications of Monte Carlo Simulation in Modelling of Biochemical Processes

Permutation tests for hypothesis testing with animal social network data: Problems and potential solutions.

Methods Ecol Evol. 2022 Jan;13(1):144-156. doi: 10.1111/2041-210X.13741. Epub 2021 Oct 28.

Psychological well-being and gene expression in Korean adults: The role of age.

Psychoneuroendocrinology. 2020 Oct;120:104785. doi: 10.1016/j.psyneuen.2020.104785. Epub 2020 Jun 24.

A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment.

J Ment Health Policy Econ. 2002 Mar;5(1):21-31.

Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes.

Stat Med. 2015 Jan 30;34(2):281-96. doi: 10.1002/sim.6344. Epub 2014 Oct 24.

引用本文的文献

Implications of Debunking the "Critical Positivity Ratio" for Humanistic Psychology: Introduction to Special Issue.

J Humanist Psychol. 2018 May;58(3):239-261. doi: 10.1177/0022167818762227. Epub 2018 Mar 29.

本文引用的文献

More Questions than Answers: Continued Critical Reanalysis of Fredrickson et al.'s Studies of Genomics and Well-Being.

PLoS One. 2016 Jun 7;11(6):e0156415. doi: 10.1371/journal.pone.0156415. eCollection 2016.

Correction: Psychological Well-Being and the Human Conserved Transcriptional Response to Adversity.

PLoS One. 2016 Jun 3;11(6):e0157116. doi: 10.1371/journal.pone.0157116. eCollection 2016.

Loneliness, eudaimonia, and the human conserved transcriptional response to adversity.

Psychoneuroendocrinology. 2015 Dec;62:11-7. doi: 10.1016/j.psyneuen.2015.07.001. Epub 2015 Jul 8.

Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors.

Perspect Psychol Sci. 2014 Nov;9(6):641-51. doi: 10.1177/1745691614551642.

Psychological well-being and the human conserved transcriptional response to adversity.

PLoS One. 2015 Mar 26;10(3):e0121839. doi: 10.1371/journal.pone.0121839. eCollection 2015.

limma powers differential expression analyses for RNA-sequencing and microarray studies.

Nucleic Acids Res. 2015 Apr 20;43(7):e47. doi: 10.1093/nar/gkv007. Epub 2015 Jan 20.

A critical reanalysis of the relationship between genomics and well-being.

Proc Natl Acad Sci U S A. 2014 Sep 2;111(35):12705-9. doi: 10.1073/pnas.1407057111. Epub 2014 Aug 25.

The effect of unmeasured confounders on the ability to estimate a true performance or selection gradient (and other partial regression coefficients).

Evolution. 2014 Jul;68(7):2128-36. doi: 10.1111/evo.12406. Epub 2014 Apr 16.

A functional genomic perspective on human well-being.

Proc Natl Acad Sci U S A. 2013 Aug 13;110(33):13684-9. doi: 10.1073/pnas.1305419110. Epub 2013 Jul 29.

Empirical pathway analysis, without permutation.

Biostatistics. 2013 Jul;14(3):573-85. doi: 10.1093/biostatistics/kxt004. Epub 2013 Feb 20.

Monte Carlo simulation of OLS and linear mixed model inference of phenotypic effects on gene expression.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

DISCUSSION

背景

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献