使用多位点等位基因频率数据进行多点连锁不平衡定位

Multipoint linkage disequilibrium mapping using multilocus allele frequency data.

作者信息

Johnson T

机构信息

Rothamsted Research, Harpenden, and School of Biological Sciences, University of Edinburgh, West Mains Road, Edinburgh EH9 3JT, Scotland, UK.

出版信息

Ann Hum Genet. 2005 Jul;69(Pt 4):474-97. doi: 10.1046/j.1529-8817.2005.00178.x.

DOI:10.1046/j.1529-8817.2005.00178.x

PMID:15996175

Abstract

This paper describes a likelihood based fine scale linkage disequilibrium mapping method for estimating the position of a disease predisposing gene relative to a battery of typed marker loci. The method uses multilocus allele frequency data from a sample of unrelated diseased individuals and from a sample of unrelated control individuals, that is, a case and control type design. This type of data could be obtained by typing DNA pools, which is less expensive than typing individuals separately. The method described uses a nonparametric model that makes it robust to the shape of the genealogy at the disease locus. It can be implemented efficiently, making a multipoint analysis of a data set of a thousand markers feasible. An example power analysis uses simulations to estimate the amount of information that can be extracted from fully resolved haplotype data, relative to multilocus allele frequency data. For the assumed parameter values and a battery of 10 markers, roughly three times narrower region estimates can be derived from haplotype data than from allele frequency data only. Depending on how we choose to measure information, allele frequency data at an additional approximately 18 or approximately 33 markers is needed to compensate for this loss of information.

摘要

本文描述了一种基于似然性的精细尺度连锁不平衡定位方法，用于估计疾病易感基因相对于一系列分型标记位点的位置。该方法使用来自无关患病个体样本和无关对照个体样本的多位点等位基因频率数据，即病例对照类型设计。这种类型的数据可以通过对DNA池进行分型获得，这比分型个体成本更低。所描述的方法使用非参数模型，使其对疾病位点的系谱形状具有鲁棒性。它可以高效实现，使得对包含一千个标记的数据集进行多点分析成为可能。一个示例功效分析使用模拟来估计相对于多位点等位基因频率数据，从完全解析的单倍型数据中可以提取的信息量。对于假设的参数值和一组10个标记，从单倍型数据得出的区域估计值比仅从等位基因频率数据得出的区域估计值窄约三倍。根据我们选择测量信息的方式，还需要额外大约18个或大约33个标记的等位基因频率数据来弥补这种信息损失。