用于检测eQTL的联合变量选择与网络建模

Cao Xuan, Ding Lili, Mersha Tesfaye B

Division of Statistics and Data Science, Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH45221,USA.

Division of Biostatistics and Epidemiology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH45229,USA.

Stat Appl Genet Mol Biol. 2020 Feb 20;19(1):/j/sagmb.2020.19.issue-1/sagmb-2019-0032/sagmb-2019-0032.xml. doi: 10.1515/sagmb-2019-0032.

In this study, we conduct a comparison of three most recent statistical methods for joint variable selection and covariance estimation with application of detecting expression quantitative trait loci (eQTL) and gene network estimation, and introduce a new hierarchical Bayesian method to be included in the comparison. Unlike the traditional univariate regression approach in eQTL, all four methods correlate phenotypes and genotypes by multivariate regression models that incorporate the dependence information among phenotypes, and use Bayesian multiplicity adjustment to avoid multiple testing burdens raised by traditional multiple testing correction methods. We presented the performance of three methods (MSSL - Multivariate Spike and Slab Lasso, SSUR - Sparse Seemingly Unrelated Bayesian Regression, and OBFBF - Objective Bayes Fractional Bayes Factor), along with the proposed, JDAG (Joint estimation via a Gaussian Directed Acyclic Graph model) method through simulation experiments, and publicly available HapMap real data, taking asthma as an example. Compared with existing methods, JDAG identified networks with higher sensitivity and specificity under row-wise sparse settings. JDAG requires less execution in small-to-moderate dimensions, but is not currently applicable to high dimensional data. The eQTL analysis in asthma data showed a number of known gene regulations such as STARD3, IKZF3 and PGAP3, all reported in asthma studies. The code of the proposed method is freely available at GitHub (https://github.com/xuan-cao/Joint-estimation-for-eQTL).

在本研究中，我们对三种最新的联合变量选择和协方差估计统计方法进行了比较，并将其应用于检测表达数量性状基因座（eQTL）和基因网络估计，同时引入一种新的层次贝叶斯方法以纳入比较。与传统的eQTL单变量回归方法不同，所有这四种方法都通过纳入表型间依赖信息的多变量回归模型来关联表型和基因型，并使用贝叶斯多重性调整来避免传统多重检验校正方法带来的多重检验负担。我们通过模拟实验以及公开可用的HapMap真实数据，以哮喘为例，展示了三种方法（MSSL - 多变量尖峰和平板套索、SSUR - 稀疏看似不相关贝叶斯回归、OBFBF - 客观贝叶斯分数贝叶斯因子）以及所提出的JDAG（通过高斯有向无环图模型进行联合估计）方法的性能。与现有方法相比，JDAG在按行稀疏设置下能够以更高的灵敏度和特异性识别网络。JDAG在中小维度下执行所需时间较少，但目前不适用于高维数据。哮喘数据的eQTL分析显示了一些已知的基因调控，如STARD3、IKZF3和PGAP3，这些在哮喘研究中均有报道。所提出方法的代码可在GitHub（https://github.com/xuan-cao/Joint-estimation-for-eQTL）上免费获取。