评估连锁研究中的全基因组统计显著性。

Assessing genomewide statistical significance in linkage studies.

作者信息

Lin D Y, Zou Fei

机构信息

Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599, USA.

出版信息

Genet Epidemiol. 2004 Nov;27(3):202-14. doi: 10.1002/gepi.20017.

DOI:10.1002/gepi.20017

PMID:15389929

Abstract

Assessment of genomewide statistical significance in multipoint linkage analysis is a thorny problem. The existing analytical solutions rely on strong assumptions (i.e., infinitely dense or equally spaced genetic markers that are fully informative and completely observed, and a single type of relative pair) which are rarely satisfied in real human studies, while simulation-based methods are computationally intensive and may not be applicable to complex data structures and sophisticated genetic models. Here, we propose a conceptually simple and numerically efficient Monte Carlo procedure for determining genomewide significance levels that is applicable to all linkage studies. The pedigree structure is completely general; the marker data are totally arbitrary in respect to number, spacing, informativeness, and missingness; the trait can be qualitative, quantitative, or multivariate; the alternative hypothesis can be two-sided or one-sided; and the statistic can be parametric or nonparametric. The usefulness of the proposed approach is demonstrated through extensive simulation studies and an application to the nuclear family data from the Tenth Genetic Analysis Workshop.

摘要

在多点连锁分析中评估全基因组统计显著性是一个棘手的问题。现有的分析方法依赖于一些强假设（即无限密集或等间距的、信息完全且能被完全观测到的遗传标记，以及单一类型的亲属对），而这些假设在实际人类研究中很少能得到满足。同时，基于模拟的方法计算量很大，并且可能不适用于复杂的数据结构和复杂的遗传模型。在此，我们提出一种概念简单且数值计算高效的蒙特卡罗程序，用于确定适用于所有连锁研究的全基因组显著性水平。系谱结构完全通用；标记数据在数量、间距、信息性和缺失情况方面完全任意；性状可以是定性的、定量的或多变量的；备择假设可以是双侧的或单侧的；统计量可以是参数化的或非参数化的。通过广泛的模拟研究以及对第十届遗传分析研讨会核心家系数据的应用，证明了所提出方法的实用性。