Staples Jeffrey, Qiao Dandi, Cho Michael H, Silverman Edwin K, Nickerson Deborah A, Below Jennifer E
Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
Channing Division of Network Medicine, Harvard School of Public Health, Boston, MA 02115, USA; Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.
Am J Hum Genet. 2014 Nov 6;95(5):553-64. doi: 10.1016/j.ajhg.2014.10.005. Epub 2014 Oct 30.
Understanding and correctly utilizing relatedness among samples is essential for genetic analysis; however, managing sample records and pedigrees can often be error prone and incomplete. Data sets ascertained by random sampling often harbor cryptic relatedness that can be leveraged in genetic analyses for maximizing power. We have developed a method that uses genome-wide estimates of pairwise identity by descent to identify families and quickly reconstruct and score all possible pedigrees that fit the genetic data by using up to third-degree relatives, and we have included it in the software package PRIMUS (Pedigree Reconstruction and Identification of the Maximally Unrelated Set). Here, we validate its performance on simulated, clinical, and HapMap pedigrees. Among these samples, we demonstrate that PRIMUS can verify reported pedigree structures and identify cryptic relationships. Finally, we show that PRIMUS reconstructed pedigrees, all of which were previously unknown, for 203 families from a cohort collected in Starr County, TX (1,890 samples).
理解并正确利用样本间的相关性对于基因分析至关重要;然而,管理样本记录和谱系往往容易出错且不完整。通过随机抽样确定的数据集常常隐藏着潜在的相关性,这些相关性可在基因分析中加以利用以最大化功效。我们开发了一种方法,该方法利用全基因组的成对同源性估计来识别家族,并通过使用至多三级亲属快速重建并对所有符合遗传数据的可能谱系进行评分,并且我们已将其纳入软件包PRIMUS(谱系重建与最大无关集识别)中。在此,我们在模拟、临床和HapMap谱系上验证了其性能。在这些样本中,我们证明PRIMUS可以验证报告的谱系结构并识别潜在关系。最后,我们展示了PRIMUS为德克萨斯州斯塔尔县收集的一个队列中的203个家族(1890个样本)重建了谱系,所有这些谱系以前都是未知的。