Hua Xing, Goedert James J, Landi Maria Teresa, Shi Jianxin
Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institute of Health, Bethesda, Md., USA.
Hum Hered. 2016;81(2):117-126. doi: 10.1159/000448733. Epub 2017 Jan 12.
Host genetics have been recently reported to affect human microbiome composition. We previously developed a statistical framework, microbiomeGWAS, to identify host genetic variants associated with microbiome composition by testing a distance matrix. However, statistical power depends on the choice of a microbiome distance matrix. To achieve more robust statistical power, we aim to extend microbiomeGWAS to test the association with many distance matrices, which are defined based on multilevel taxa abundances and phylogenetic information.
The main challenge is to accurately and rapidly evaluate the significance for millions of SNPs. We propose methods for approximating p values by correcting for the multiple testing introduced by testing many distance matrices and by correcting for the skewness and kurtosis of score statistics.
The accuracy of p value approximation was verified by simulations. We applied our method to a set of 147 lung cancer patients with 16S rRNA microbiome profiles from nonmalignant lung tissues. We show that correcting for skewness and kurtosis eliminated dramatic deviations in the quantile-quantile plot.
We developed computationally efficient methods for identifying host genetic variants associated with microbiome composition by testing many distance matrices. The algorithms are implemented in the package microbiomeGWAS (https://github.com/lsncibb/microbiomeGWAS).
最近有报道称宿主基因会影响人类微生物组的组成。我们之前开发了一个统计框架——微生物组全基因组关联研究(microbiomeGWAS),通过测试距离矩阵来识别与微生物组组成相关的宿主基因变异。然而,统计效力取决于微生物组距离矩阵的选择。为了获得更强健的统计效力,我们旨在扩展微生物组全基因组关联研究,以测试与许多基于多级分类群丰度和系统发育信息定义的距离矩阵的关联。
主要挑战在于准确且快速地评估数百万个单核苷酸多态性(SNP)的显著性。我们提出了通过校正测试多个距离矩阵所引入的多重检验以及校正得分统计量的偏度和峰度来近似p值的方法。
通过模拟验证了p值近似的准确性。我们将我们的方法应用于一组147名肺癌患者,这些患者具有来自非恶性肺组织的16S rRNA微生物组谱。我们表明,校正偏度和峰度消除了分位数 - 分位数图中的显著偏差。
我们开发了计算效率高的方法,通过测试多个距离矩阵来识别与微生物组组成相关的宿主基因变异。这些算法在微生物组全基因组关联研究软件包(https://github.com/lsncibb/microbiomeGWAS)中实现。