O'Brien John D, Iqbal Zamin, Wendler Jason, Amenga-Etego Lucas
Mathematics Department, Bowdoin College, Brunswick, Maine, United States of America.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, Oxfordshire, United Kingdom.
PLoS Comput Biol. 2016 Jun 30;12(6):e1004824. doi: 10.1371/journal.pcbi.1004824. eCollection 2016 Jun.
We present a rigorous statistical model that infers the structure of P. falciparum mixtures-including the number of strains present, their proportion within the samples, and the amount of unexplained mixture-using whole genome sequence (WGS) data. Applied to simulation data, artificial laboratory mixtures, and field samples, the model provides reasonable inference with as few as 10 reads or 50 SNPs and works efficiently even with much larger data sets. Source code and example data for the model are provided in an open source fashion. We discuss the possible uses of this model as a window into within-host selection for clinical and epidemiological studies.
我们提出了一种严格的统计模型,该模型利用全基因组序列(WGS)数据推断恶性疟原虫混合物的结构,包括存在的菌株数量、它们在样本中的比例以及无法解释的混合物数量。应用于模拟数据、人工实验室混合物和现场样本时,该模型仅需10条读数或50个单核苷酸多态性(SNP)就能提供合理的推断,即使对于大得多的数据集也能高效运行。该模型的源代码和示例数据以开源方式提供。我们讨论了该模型作为临床和流行病学研究中宿主内选择窗口的可能用途。