Stewart W C L, Hager V R
Nationwide Children's Hospital, Columbus, OH, USA.
Department of Pediatrics, The Ohio State University, Columbus, OH, USA.
Heredity (Edinb). 2016 Aug;117(2):109-13. doi: 10.1038/hdy.2016.33. Epub 2016 Jun 1.
In the analysis of DNA sequences on related individuals, most methods strive to incorporate as much information as possible, with little or no attention paid to the issue of statistical significance. For example, a modern workstation can easily handle the computations needed to perform a large-scale genome-wide inheritance-by-descent (IBD) scan, but accurate assessment of the significance of that scan is often hindered by inaccurate approximations and computationally intensive simulation. To address these issues, we developed gLOD-a test of co-segregation that, for large samples, models chromosome-specific IBD statistics as a collection of stationary Gaussian processes. With this simple model, the parametric bootstrap yields an accurate and rapid assessment of significance-the genome-wide corrected P-value. Furthermore, we show that (i) under the null hypothesis, the limiting distribution of the gLOD is the standard Gumbel distribution; (ii) our parametric bootstrap simulator is approximately 40 000 times faster than gene-dropping methods, and it is more powerful than methods that approximate the adjusted P-value; and, (iii) the gLOD has the same statistical power as the widely used maximum Kong and Cox LOD. Thus, our approach gives researchers the ability to determine quickly and accurately the significance of most large-scale IBD scans, which may contain multiple traits, thousands of families and tens of thousands of DNA sequences.
在对相关个体的DNA序列进行分析时,大多数方法都力求纳入尽可能多的信息,而很少或根本不关注统计显著性问题。例如,一台现代工作站可以轻松处理进行大规模全基因组遗传传递(IBD)扫描所需的计算,但该扫描显著性的准确评估常常受到不准确近似值和计算密集型模拟的阻碍。为了解决这些问题,我们开发了gLOD——一种共分离检验方法,对于大样本,该方法将特定染色体的IBD统计数据建模为一组平稳高斯过程。借助这个简单模型,参数自展法能够对显著性进行准确且快速的评估——即全基因组校正P值。此外,我们还表明:(i)在原假设下,gLOD的极限分布是标准耿贝尔分布;(ii)我们的参数自展模拟器比基因分型法快约40000倍,并且比近似调整后P值的方法更具效力;(iii)gLOD与广泛使用的最大孔氏和考克斯LOD具有相同的统计效力。因此,我们的方法使研究人员能够快速准确地确定大多数大规模IBD扫描的显著性,这些扫描可能包含多个性状、数千个家族以及数万个DNA序列。