Verhulst Brad, Prom-Wormley Elizabeth, Keller Matthew, Medland Sarah, Neale Michael C
Department of Psychology, Michigan State University, East Lansing, USA.
Family Medicine and Population Health, Virginia Commonwealth University, Richmond, USA.
Behav Genet. 2019 Jan;49(1):99-111. doi: 10.1007/s10519-018-9942-y. Epub 2018 Dec 20.
For many multivariate twin models, the numerical Type I error rates are lower than theoretically expected rates using a likelihood ratio test (LRT), which implies that the significance threshold for statistical hypothesis tests is more conservative than most twin researchers realize. This makes the numerical Type II error rates higher than theoretically expected. Furthermore, the discrepancy between the observed and expected error rates increases as more variables are included in the analysis and can have profound implications for hypothesis testing and statistical inference. In two simulation studies, we examine the Type I error rates for the Cholesky decomposition and Correlated Factors models. Both show markedly lower than nominal Type I error rates under the null hypothesis, a discrepancy that increases with the number of variables in the model. In addition, we observe slightly biased parameter estimates for the Cholesky decomposition and Correlated Factors models. By contrast, if the variance-covariance matrices for variance components are estimated directly (without constraints), the numerical Type I error rates are consistent with theoretical expectations and there is no bias in the parameter estimates regardless of the number of variables analyzed. We call this the direct symmetric approach. It appears that each model-implied boundary, whether explicit or implicit, increases the discrepancy between the numerical and theoretical Type I error rates by truncating the sampling distributions of the variance components and inducing bias in the parameters. The direct symmetric approach has several advantages over other multivariate twin models as it corrects the Type I error rate and parameter bias issues, is easy to implement in current software, and has fewer optimization problems. Implications for past and future research, and potential limitations associated with direct estimation of genetic and environmental covariance matrices are discussed.
对于许多多变量双生子模型而言,使用似然比检验(LRT)时,数值上的I型错误率低于理论预期率,这意味着统计假设检验的显著性阈值比大多数双生子研究人员意识到的更为保守。这使得数值上的II型错误率高于理论预期。此外,随着分析中纳入的变量增多,观察到的错误率与预期错误率之间的差异会增大,这可能对假设检验和统计推断产生深远影响。在两项模拟研究中,我们检验了Cholesky分解模型和相关因素模型的I型错误率。在原假设下,两者均显示出明显低于名义I型错误率,且这种差异会随着模型中变量数量的增加而增大。此外,我们观察到Cholesky分解模型和相关因素模型的参数估计存在轻微偏差。相比之下,如果直接估计方差分量的方差协方差矩阵(无约束),数值上的I型错误率与理论预期一致,并且无论分析的变量数量多少,参数估计均无偏差。我们将此称为直接对称方法。似乎每个模型隐含的边界,无论是明确的还是隐含的,都会通过截断方差分量的抽样分布并在参数中引入偏差,从而增加数值I型错误率与理论I型错误率之间的差异。直接对称方法相对于其他多变量双生子模型具有多个优势,因为它纠正了I型错误率和参数偏差问题,易于在当前软件中实现,并且优化问题较少。文中还讨论了对过去和未来研究的影响,以及与直接估计遗传和环境协方差矩阵相关的潜在局限性。