Sen Saunak, Satagopan Jaya M, Churchill Gary A
Department of Epidemiology and Biostatistics, University of California, San Francisco, 94143, USA.
Genetics. 2005 May;170(1):447-64. doi: 10.1534/genetics.104.038612. Epub 2005 Mar 21.
We examine the efficiency of different genotyping and phenotyping strategies in inbred line crosses from an information perspective. This provides a mathematical framework for the statistical aspects of QTL experimental design, while guiding our intuition. Our central result is a simple formula that quantifies the fraction of missing information of any genotyping strategy in a backcross. It includes the special case of selectively genotyping only the phenotypic extreme individuals. The formula is a function of the square of the phenotype and the uncertainty in our knowledge of the genotypes at a locus. This result is used to answer a variety of questions. First, we examine the cost-information trade-off varying the density of markers and the proportion of extreme phenotypic individuals genotyped. Then we evaluate the information content of selective phenotyping designs and the impact of measurement error in phenotyping. A simple formula quantifies the information content of any combined phenotyping and genotyping design. We extend our results to cover multigenotype crosses, such as the F(2) intercross, and multiple QTL models. We find that when the QTL effect is small, any contrast in a multigenotype cross benefits from selective genotyping in the same manner as in a backcross. The benefit remains in the presence of a second unlinked QTL with small effect (explaining <20% of the variance), but diminishes if the second QTL has a large effect. Software for performing power calculations for backcross and F(2) intercross incorporating selective genotyping and marker spacing is available from http://www.biostat.ucsf.edu/sen.
我们从信息角度研究了近交系杂交中不同基因分型和表型分型策略的效率。这为QTL实验设计的统计方面提供了一个数学框架,同时引导我们的直觉。我们的核心结果是一个简单的公式,该公式量化了回交中任何基因分型策略的缺失信息比例。它包括仅对表型极端个体进行选择性基因分型的特殊情况。该公式是表型平方以及我们对某一位点基因型了解的不确定性的函数。这一结果用于回答各种问题。首先,我们研究了改变标记密度和进行基因分型的极端表型个体比例时的成本-信息权衡。然后我们评估了选择性表型分型设计的信息含量以及表型分型中测量误差的影响。一个简单的公式量化了任何联合表型分型和基因分型设计的信息含量。我们将结果扩展到涵盖多基因型杂交,如F(2)互交,以及多QTL模型。我们发现,当QTL效应较小时,多基因型杂交中的任何对比都以与回交相同的方式从选择性基因分型中受益。在存在第二个效应较小(解释的方差<20%)的不连锁QTL时,这种益处仍然存在,但如果第二个QTL效应较大,益处就会减少。可从http://www.biostat.ucsf.edu/sen获取用于进行包含选择性基因分型和标记间距的回交和F(2)互交功效计算的软件。