O'Brien John D, Amenga-Etego Lucas, Li Ruiqi
Department of Mathematics, Bowdoin College, 8600 College Station, Brunswick, ME, USA.
Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, UK.
Malar J. 2016 Sep 15;15:473. doi: 10.1186/s12936-016-1531-z.
The advent of whole-genome sequencing has generated increased interest in modelling the structure of strain mixture within clinical infections of Plasmodium falciparum The life cycle of the parasite implies that the mixture of multiple strains within an infected individual is related to the out-crossing rate across populations, making methods for measuring this process in situ central to understanding the genetic epidemiology of the disease.
This paper derives a set of new estimators for inferring inbreeding coefficients using whole genome sequence read count data from P. falciparum clinical samples, which provides resources to assess within-sample mixture that connect to extensive literatures in population genetics and conservation ecology. Features of the P. falciparum genome mean that standard methods for inbreeding coefficients and related F-statistics cannot be used directly. After reviewing an initial effort to estimate the inbreeding coefficient within clinical isolates of P. falciparum, several generalizations using both frequentist and Bayesian approaches are provided. A simpler, more intuitive frequentist estimator is shown to have nearly identical properties to the initial estimator both in simulation and in real data sets. The Bayesian approach connects these estimates to the Balding-Nichols model, a mainstay within genetic epidemiology, and a possible framework for more complex modelling. A simulation study shows strong performance for all estimators with as few as ten variants. Application to samples from the PF3K data set indicate significant across-country variation in within-sample mixture. Finally, a comparison with results from a recent mixture model for within-sample strain mixture show that inbreeding coefficients provide a strong proxy for these more complex models.
This paper provides a set of methods for estimating inbreeding coefficients within P. falciparum samples from whole-genome sequence data, supported by simulation studies and empirical examples. It includes a substantially simple estimator with similar statistical properties to the estimator in current use. These methods will also be applicable to other species with similar life-cycles. Implementations of the methods described are available in an open-source R package pfmix. Estimates for the PF3K public data release are provide as part of this resource.
全基因组测序的出现引发了人们对恶性疟原虫临床感染中菌株混合结构建模的更多关注。寄生虫的生命周期意味着感染个体内多种菌株的混合与群体间的杂交率相关,这使得原位测量该过程的方法成为理解该疾病遗传流行病学的核心。
本文推导了一组新的估计器,用于使用来自恶性疟原虫临床样本的全基因组序列读数计数数据推断近交系数,这为评估样本内混合提供了资源,与群体遗传学和保护生态学的大量文献相关联。恶性疟原虫基因组的特征意味着不能直接使用估计近交系数和相关F统计量的标准方法。在回顾了最初估计恶性疟原虫临床分离株内近交系数的努力后,提供了几种使用频率论和贝叶斯方法的推广。一种更简单、更直观的频率论估计器在模拟和实际数据集中都显示出与初始估计器几乎相同的性质。贝叶斯方法将这些估计与遗传流行病学的主要支柱——巴尔丁 - 尼科尔斯模型联系起来,并且是更复杂建模的一个可能框架。一项模拟研究表明,所有估计器在仅有十个变体时就具有很强的性能。应用于PF3K数据集的样本表明,样本内混合存在显著的跨国差异。最后,与最近用于样本内菌株混合的混合模型结果的比较表明,近交系数为这些更复杂的模型提供了有力的替代指标。
本文提供了一组从全基因组序列数据估计恶性疟原虫样本内近交系数的方法,得到了模拟研究和实证例子的支持。它包括一个与当前使用的估计器具有相似统计性质的非常简单的估计器。这些方法也将适用于具有相似生命周期的其他物种。所描述方法的实现可在开源R包pfmix中获得。作为该资源的一部分,提供了PF3K公共数据发布的估计值。