Feenstra B, Greenberg D A, Hodge S E
Division of Statistical Genetics, Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, N.Y., USA.
Hum Hered. 2004;57(2):100-8. doi: 10.1159/000077547.
Human recombination fraction (RF) can differ between males and females, but investigators do not always know which disease genes are located in genomic areas of large RF sex differences. Knowledge of RF sex differences contributes to our understanding of basic biology and can increase the power of a linkage study, improve gene localization, and provide clues to possible imprinting. One way to detect these differences is to use lod scores. In this study we focused on detecting RF sex differences and answered the following questions, in both phase-known and phase-unknown matings: (1) How large a sample size is needed to detect a RF sex difference? (2) What are "optimal" proportions of paternally vs. maternally informative matings? (3) Does ascertaining nonoptimal proportions of paternally or maternally informative matings lead to ascertainment bias? Our results were as follows: (1) We calculated expected lod scores (ELODs) under two different conditions: "unconstrained," allowing sex-specific RF parameters (theta(female), theta(male)); and "constrained," requiring theta(female) = theta(male). We then examined the DeltaELOD (identical with difference between maximized constrained and unconstrained ELODs) and calculated minimum sample sizes required to achieve statistically significant DeltaELODs. For large RF sex differences, samples as small as 10 to 20 fully informative matings can achieve statistical significance. We give general sample size guidelines for detecting RF differences in informative phase-known and phase-unknown matings. (2) We defined p as the proportion of paternally informative matings in the dataset; and the optimal proportion p(circ) as that value of p that maximizes DeltaELOD. We determined that, surprisingly, p(circ) does not necessarily equal (1/2), although it does fall between approximately 0.4 and 0.6 in most situations. (3) We showed that if p in a sample deviates from its optimal value, no bias is introduced (asymptotically) to the maximum likelihood estimates of theta(female) and theta(male), even though ELOD is reduced (see point 2). This fact is important because often investigators cannot control the proportions of paternally and maternally informative families. In conclusion, it is possible to reliably detect sex differences in recombination fraction.
人类重组率(RF)在男性和女性之间可能存在差异,但研究人员并不总是知道哪些疾病基因位于重组率性别差异较大的基因组区域。了解重组率的性别差异有助于我们理解基础生物学,并能提高连锁研究的效力、改善基因定位,并为可能的印记提供线索。检测这些差异的一种方法是使用对数优势分数。在本研究中,我们专注于检测重组率的性别差异,并在已知相位和未知相位交配中回答了以下问题:(1)检测重组率性别差异需要多大的样本量?(2)父系信息交配与母系信息交配的“最佳”比例是多少?(3)确定父系或母系信息交配的非最佳比例是否会导致确定偏倚?我们的结果如下:(1)我们在两种不同条件下计算了预期对数优势分数(ELOD):“无约束”,允许性别特异性重组率参数(θ(女性),θ(男性));以及“有约束”,要求θ(女性)=θ(男性)。然后我们检查了ΔELOD(等同于最大化的有约束和无约束ELOD之间的差异),并计算了达到统计学显著ΔELOD所需的最小样本量。对于较大的重组率性别差异,小至10到20个完全信息交配样本就能达到统计学显著性。我们给出了在已知相位和未知相位信息交配中检测重组率差异的一般样本量指南。(2)我们将p定义为数据集中父系信息交配的比例;最佳比例p(circ)定义为使ΔELOD最大化的p值。我们确定,令人惊讶的是,p(circ)不一定等于(1/2),尽管在大多数情况下它确实落在大约0.4到0.6之间。(3)我们表明,如果样本中的p偏离其最佳值,即使ELOD降低,也不会(渐近地)对θ(女性)和θ(男性)的最大似然估计引入偏倚(见第2点)。这一事实很重要,因为研究人员通常无法控制父系和母系信息家庭的比例。总之,可靠地检测重组率的性别差异是可能的。