Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA; Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.
Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA.
Am J Hum Genet. 2014 Feb 6;94(2):257-67. doi: 10.1016/j.ajhg.2014.01.005.
The use of large pedigrees is an effective design for identifying rare functional variants affecting heritable traits. Cost-effective studies using sequence data can be achieved via pedigree-based genotype imputation in which some subjects are sequenced and missing genotypes are inferred on the remaining subjects. Because of high cost, it is important to carefully prioritize subjects for sequencing. Here, we introduce a statistical framework that enables systematic comparison among subject-selection choices for sequencing. We introduce a metric "local coverage," which allows the use of inferred inheritance vectors to measure genotype-imputation ability specifically in a region of interest, such as one with prior evidence of linkage. In the absence of linkage information, we can instead use a "genome-wide coverage" metric computed with the pedigree structure. These metrics enable the development of a method that identifies efficient selection choices for sequencing. As implemented in GIGI-Pick, this method also flexibly allows initial manual selection of subjects and optimizes selections within the constraint that only some subjects might be available for sequencing. In the present study, we used simulations to compare GIGI-Pick with PRIMUS, ExomePicks, and common ad hoc methods of selecting subjects. In genotype imputation of both common and rare alleles, GIGI-Pick substantially outperformed all other methods considered and had the added advantage of incorporating prior linkage information. We also used a real pedigree to demonstrate the utility of our approach in identifying causal mutations. Our work enables prioritization of subjects for sequencing to facilitate dissection of the genetic basis of heritable traits.
利用大型家系是鉴定影响遗传性状的罕见功能变异的有效设计。通过基于家系的基因型推断,可以使用经济有效的序列数据进行研究,其中一些个体进行测序,其余个体的缺失基因型进行推断。由于成本高昂,因此仔细优先选择测序对象非常重要。在这里,我们介绍了一种统计框架,可实现对测序对象选择的系统比较。我们引入了一个度量标准“局部覆盖率”,该标准允许使用推断的遗传向量来专门测量目标区域(例如具有先前连锁证据的区域)中的基因型推断能力。在没有连锁信息的情况下,我们可以使用基于家系结构计算的“全基因组覆盖率”度量标准。这些指标使我们能够开发一种方法,该方法可以识别用于测序的有效选择方案。作为 GIGI-Pick 的实现方法,该方法还可以灵活地进行初始手动选择,并在仅一些个体可能进行测序的约束条件下优化选择。在本研究中,我们使用模拟来比较 GIGI-Pick 与 PRIMUS、ExomePicks 和常见的选择对象的特定方法。在常见和罕见等位基因的基因型推断中,GIGI-Pick 均明显优于所有其他考虑的方法,并且具有纳入先前连锁信息的额外优势。我们还使用真实的家系证明了我们的方法在识别因果突变中的实用性。我们的工作可以优先选择测序对象,以促进对遗传性状遗传基础的剖析。