Rao S, Xia L
Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio 44106, USA.
Genetica. 2000;109(3):183-97. doi: 10.1023/a:1017507624695.
The search for efficient and powerful statistical methods and optimal mapping strategies for categorical traits under various experimental designs continues to be one of the main tasks in genetic mapping studies. Methodologies for genetic mapping of categorical traits can generally be classified into two groups, linear and non-linear models. We develop a method based on a threshold model, termed mixture threshold model to handle ordinal (or binary) data from multiple families. Monte Carlo simulations are done to compare its statistical efficiencies and properties of the proposed non-linear model with a linear model for genetic mapping of categorical traits using multiple families. The mixture threshold model has notably higher statistical power than linear models. There may be an optimal sampling strategy (family size vs number of families) in which genetic mapping reaches its maximal power and minimal estimation errors. A single large-sibship family does not necessarily produce the maximal power for detection of quantitative trait loci (QTL) due to genetic sampling of QTL alleles. The QTL allelic model has a marked impact on efficiency of genetic mapping of categorical traits in terms of statistical power and QTL parameter estimation. Compared with a fixed number of QTL alleles (two or four), the model with an infinite number of QTL alleles and normally distributed allelic effects results in loss of statistical power. The results imply that inbred designs (e.g. F2 or four-way crosses) with a few QTL alleles segregating or reducing number of QTL alleles (e.g. by selection) in outbred populations are desirable in genetic mapping of categorical traits using data from multiple families.
在各种实验设计下,寻找针对分类性状的高效且强大的统计方法和最优映射策略,仍然是基因定位研究的主要任务之一。分类性状基因定位的方法通常可分为两类,即线性模型和非线性模型。我们基于阈值模型开发了一种方法,称为混合阈值模型,用于处理来自多个家系的有序(或二元)数据。进行了蒙特卡罗模拟,以比较所提出的非线性模型与用于多家族分类性状基因定位的线性模型的统计效率和特性。混合阈值模型的统计功效明显高于线性模型。可能存在一种最优抽样策略(家系大小与家系数目),在此策略下基因定位达到其最大功效和最小估计误差。由于数量性状位点(QTL)等位基因的遗传抽样,单个大家系不一定能产生检测QTL的最大功效。QTL等位基因模型在统计功效和QTL参数估计方面对分类性状基因定位的效率有显著影响。与固定数量的QTL等位基因(两个或四个)相比,具有无限数量QTL等位基因且等位基因效应呈正态分布的模型会导致统计功效降低。结果表明,在利用多家族数据进行分类性状基因定位时,采用具有少量QTL等位基因分离的近交设计(如F2或四元杂交)或减少远交群体中QTL等位基因数量(如通过选择)是可取的。