Macgregor Stuart, Knott Sara A, Visscher Peter M
Institute of Evolutionary Biology, University of Edinburgh, West Mains Road, Edinburgh, United Kingdom.
Twin Res Hum Genet. 2006 Feb;9(1):9-16. doi: 10.1375/183242706776403000.
Linkage analysis (either parametric or nonparametric) is commonly applied to identify chromosomal regions using related individuals affected by disease. In complex disease the incomplete relationship between phenotype and genotype can be modeled using a phenocopy parameter, the probability that an individual is affected given they do not carry the disease mutation of interest, and a nonpenetrance parameter, the probability that an individual is not affected given they do carry the disease mutation of interest. If the linkage phase between multiple markers and a putative disease locus is known, then haplotypes carrying the mutation can, in principle, be identified by comparing the chromosome segments that are shared identical-by-descent (IBD) across affected individuals. We consider here the effect of a nonzero phenocopy rate on the linkage peak and hence upon the identification of disease haplotypes that are shared IBD between affected individuals. We show, by theory and computer simulation, that in diseases for which there is a nonzero phenocopy rate, the chromosomal regions identified may not include the true disease locus. We utilize a LOD-1 confidence interval for a widely used nonparametric linkage statistic. We find that in small/moderate samples this confidence interval may be inappropriate. We give specific examples where the phenocopy rates are nonnegligible in some complex diseases. The success of further work to identify the causal mutations underlying the linkage peaks in these diseases will depend on researchers allowing for the presence of phenocopies by examining appropriately wide regions around the initial positive linkage finding.
连锁分析(参数法或非参数法)通常用于利用受疾病影响的相关个体来识别染色体区域。在复杂疾病中,表型与基因型之间的不完全关系可以使用表型模拟参数(即个体在不携带感兴趣的疾病突变时受影响的概率)和非外显率参数(即个体在携带感兴趣的疾病突变时不受影响的概率)来建模。如果多个标记与假定的疾病位点之间的连锁相已知,那么原则上,通过比较受影响个体中通过血缘相同(IBD)共享的染色体片段,可以识别携带突变的单倍型。我们在此考虑非零表型模拟率对连锁峰的影响,进而对受影响个体之间IBD共享的疾病单倍型识别的影响。我们通过理论和计算机模拟表明,在存在非零表型模拟率的疾病中,所识别的染色体区域可能不包括真正的疾病位点。我们对广泛使用的非参数连锁统计量使用LOD - 1置信区间。我们发现,在小/中等样本中,这个置信区间可能不合适。我们给出了一些具体例子,说明在某些复杂疾病中表型模拟率不可忽略。在这些疾病中,进一步识别连锁峰背后因果突变的工作能否成功,将取决于研究人员是否通过检查初始阳性连锁发现周围适当宽的区域来考虑表型模拟的存在。