Lin Wan-Yu
Institute of Health Data Analytics and Statistics, College of Public Health, National Taiwan University, Taipei, Taiwan.
Master of Public Health Program, College of Public Health, National Taiwan University, Taipei, Taiwan.
Front Genet. 2025 Sep 1;16:1617504. doi: 10.3389/fgene.2025.1617504. eCollection 2025.
Detection of variance quantitative trait loci (vQTL) can facilitate the discovery of gene-environment (GxE) and gene-gene interactions (GxG). Identifying vQTLs before direct GxE and GxG analyses can considerably reduce the number of tests and the multiple-testing penalty.
Despite some methods proposed for vQTL detection, few studies have performed a head-to-head comparison simultaneously concerning false positive rates (FPRs), power, and computational time. This work compares three parametric and two non-parametric vQTL tests.
Simulation studies show that the deviation regression model (DRM) and Kruskal-Wallis test (KW) are the most recommended parametric and non-parametric tests, respectively. The quantile integral linear model (QUAIL, non-parametric) appropriately preserves the FPR under normally or non-normally distributed traits. However, its power is never among the optimal choices, and its computational time is much longer than that of competitors. The Brown-Forsythe test (BF, parametric) can suffer from severe inflation in FPR when SNP's minor allele frequencies <0.2. The double generalized linear model (DGLM, parametric) is not valid for non-normally distributed traits, although it is the most powerful method for normally distributed traits.
Considering the robustness (to outliers) and computation time, I chose KW to analyze four lipid traits in the Taiwan Biobank. I further showed that GxE and GxG were enriched among 30 vQTLs identified from the four lipid traits.
方差数量性状基因座(vQTL)的检测有助于发现基因-环境(GxE)和基因-基因相互作用(GxG)。在直接进行GxE和GxG分析之前识别vQTL可以显著减少测试数量和多重测试惩罚。
尽管已经提出了一些用于vQTL检测的方法,但很少有研究同时对假阳性率(FPR)、功效和计算时间进行直接比较。这项工作比较了三种参数化和两种非参数化的vQTL测试。
模拟研究表明,偏差回归模型(DRM)和Kruskal-Wallis检验(KW)分别是最推荐的参数化和非参数化测试。分位数积分线性模型(QUAIL,非参数化)在性状呈正态或非正态分布时能适当保持FPR。然而,其功效从未处于最佳选择之中,并且其计算时间比竞争对手长得多。当单核苷酸多态性(SNP)的次要等位基因频率<0.2时,Brown-Forsythe检验(BF,参数化)的FPR可能会严重膨胀。双广义线性模型(DGLM,参数化)对于非正态分布的性状无效,尽管它是正态分布性状最强大的方法。
考虑到稳健性(对异常值)和计算时间,我选择KW来分析台湾生物银行中的四种脂质性状。我进一步表明,GxE和GxG在从这四种脂质性状中鉴定出的30个vQTL中富集。