Department of Genetics, Stanford University, Stanford, CA 94305, USA; Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305, USA.
Am J Hum Genet. 2013 Aug 8;93(2):278-88. doi: 10.1016/j.ajhg.2013.06.020. Epub 2013 Aug 1.
Local-ancestry inference is an important step in the genetic analysis of fully sequenced human genomes. Current methods can only detect continental-level ancestry (i.e., European versus African versus Asian) accurately even when using millions of markers. Here, we present RFMix, a powerful discriminative modeling approach that is faster (30×) and more accurate than existing methods. We accomplish this by using a conditional random field parameterized by random forests trained on reference panels. RFMix is capable of learning from the admixed samples themselves to boost performance and autocorrect phasing errors. RFMix shows high sensitivity and specificity in simulated Hispanics/Latinos and African Americans and admixed Europeans, Africans, and Asians. Finally, we demonstrate that African Americans in HapMap contain modest (but nonzero) levels of Native American ancestry (0.4%).
局部祖源推断是全基因组测序人类遗传分析的重要步骤。即使使用数百万个标记,当前的方法也只能准确检测到大陆级别的祖源(即欧洲人、非洲人、亚洲人)。在这里,我们提出了 RFMix,这是一种强大的判别建模方法,比现有方法更快(30 倍)、更准确。我们通过使用随机森林参数化的条件随机场来实现这一点,该随机场是在参考面板上进行训练的。RFMix 能够从混合样本中学习,以提高性能并自动纠正相位错误。RFMix 在模拟的西班牙裔/拉丁裔和非裔美国人以及混合的欧洲人、非洲人和亚洲人中表现出高灵敏度和特异性。最后,我们证明 HapMap 中的非裔美国人含有适度(但非零)的美洲原住民祖源(0.4%)。