Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115
Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142.
Genetics. 2019 Aug;212(4):1337-1351. doi: 10.1534/genetics.119.302120. Epub 2019 Jun 17.
Understanding the relatedness of individuals within or between populations is a common goal in biology. Increasingly, relatedness features in genetic epidemiology studies of pathogens. These studies are relatively new compared to those in humans and other organisms, but are important for designing interventions and understanding pathogen transmission. Only recently have researchers begun to routinely apply relatedness to apicomplexan eukaryotic malaria parasites, and to date have used a range of different approaches on an basis. Therefore, it remains unclear how to compare different studies and which measures to use. Here, we systematically compare measures based on identity-by-state (IBS) and identity-by-descent (IBD) using a globally diverse data set of malaria parasites, and , and provide marker requirements for estimates based on IBD. We formally show that the informativeness of polyallelic markers for relatedness inference is maximized when alleles are equifrequent. Estimates based on IBS are sensitive to allele frequencies, which vary across populations and by experimental design. For portability across studies, we thus recommend estimates based on IBD. To generate estimates with errors below an arbitrary threshold of 0.1, we recommend ∼100 polyallelic or 200 biallelic markers. Marker requirements are immediately applicable to haploid malaria parasites and other haploid eukaryotes. C.I.s facilitate comparison when different marker sets are used. This is the first attempt to provide rigorous analysis of the reliability of, and requirements for, relatedness inference in malaria genetic epidemiology. We hope it will provide a basis for statistically informed prospective study design and surveillance strategies.
了解个体在群体内部或群体之间的亲缘关系是生物学中的一个共同目标。亲缘关系在病原体的遗传流行病学研究中越来越受到重视。与人类和其他生物的研究相比,这些研究相对较新,但对于设计干预措施和理解病原体传播非常重要。直到最近,研究人员才开始常规地将亲缘关系应用于顶复门真核疟原虫寄生虫,并在 基础上迄今为止已经使用了一系列不同的方法。因此,如何比较不同的研究以及使用哪些措施仍然不清楚。在这里,我们使用来自全球多样化的疟原虫数据集,系统地比较了基于状态相同(IBS)和血缘相同(IBD)的方法,并为基于 IBD 的估计提供了标记要求。我们正式表明,当等位基因等频时,多等位基因标记对于亲缘关系推断的信息量最大。基于 IBS 的估计对等位基因频率敏感,而等位基因频率在不同人群和实验设计中有所不同。为了在研究之间具有可移植性,因此我们建议使用基于 IBD 的估计。为了生成误差低于任意 0.1 阈值的估计值,我们建议使用约 100 个多等位基因或 200 个二等位基因标记。标记要求立即适用于单倍体疟原虫和其他单倍体真核生物。当使用不同的标记集时,C.I. 有助于比较。这是首次尝试对疟疾遗传流行病学中亲缘关系推断的可靠性和要求进行严格分析。我们希望它能为有统计依据的前瞻性研究设计和监测策略提供基础。