Public Health Sciences, University of Edinburgh, Edinburgh, UK.
BMC Genomics. 2010 Feb 25;11:139. doi: 10.1186/1471-2164-11-139.
Genome-wide homozygosity estimation from genomic data is becoming an increasingly interesting research topic. The aim of this study was to compare different methods for estimating individual homozygosity-by-descent based on the information from human genome-wide scans rather than genealogies. We considered the four most commonly used methods and investigated their applicability to single-nucleotide polymorphism (SNP) data in both a simulation study and by using the human genotyped data. A total of 986 inhabitants from the isolated Island of Vis, Croatia (where inbreeding is present, but no pedigree-based inbreeding was observed at the level of F > 0.0625) were included in this study. All individuals were genotyped with the Illumina HumanHap300 array with 317,503 SNP markers.
Simulation data suggested that multi-point FEstim is the method most strongly correlated to true homozygosity-by-descent. Correlation coefficients between the homozygosity-by-descent estimates were high but only for inbred individuals, with nearly absolute correlation between single-point measures.
Deciding who is really inbred is a methodological challenge where multi-point approaches can be very helpful once the set of SNP markers is filtered to remove linkage disequilibrium. The use of several different methodological approaches and hence different homozygosity measures can help to distinguish between homozygosity-by-state and homozygosity-by-descent in studies investigating the effects of genomic autozygosity on human health.
基于基因组数据进行全基因组同质性估计正在成为一个越来越有趣的研究课题。本研究的目的是比较基于人类全基因组扫描而非系谱信息来估计个体同源性的四种最常用的方法。我们考虑了四种最常用的方法,并通过模拟研究和使用人类基因分型数据来研究它们在单核苷酸多态性 (SNP) 数据中的适用性。本研究共纳入了来自克罗地亚孤立岛屿维斯的 986 名居民(该岛存在近交,但在 F > 0.0625 水平上没有观察到基于系谱的近交)。所有个体均使用 Illumina HumanHap300 阵列进行基因分型,共 317,503 个 SNP 标记。
模拟数据表明,多点 FEstim 是与真实同源性估计最相关的方法。估计的同源性之间的相关系数很高,但仅适用于近交个体,单点测量之间几乎存在绝对相关性。
确定谁是真正的近交是一个方法学挑战,一旦对 SNP 标记进行过滤以消除连锁不平衡,多点方法就可以非常有帮助。在研究基因组自身同源性对人类健康影响的研究中,使用几种不同的方法学方法和不同的同质性测量方法可以帮助区分同质性状态和同源性。