Swiss Tropical and Public Health Institute, Basel, Switzerland.
PLoS One. 2012;7(8):e42496. doi: 10.1371/journal.pone.0042496. Epub 2012 Aug 29.
People living in endemic areas often habour several malaria infections at once. High-resolution genotyping can distinguish between infections by detecting the presence of different alleles at a polymorphic locus. However the number of infections may not be accurately counted since parasites from multiple infections may carry the same allele. We use simulation to determine the circumstances under which the number of observed genotypes are likely to be substantially less than the number of infections present and investigate the performance of two methods for estimating the numbers of infections from high-resolution genotyping data. The simulations suggest that the problem is not substantial in most datasets: the disparity between the mean numbers of infections and of observed genotypes was small when there was 20 or more alleles, 20 or more blood samples, a mean number of infections of 6 or less and where the frequency of the most common allele was no greater than 20%. The issue of multiple infections carrying the same allele is unlikely to be a major component of the errors in PCR-based genotyping. Simulations also showed that, with heterogeneity in allele frequencies, the observed frequencies are not a good approximation of the true allele frequencies. The first method that we proposed to estimate the numbers of infections assumes that they are a good approximation and hence did poorly in the presence of heterogeneity. In contrast, the second method by Li et al estimates both the numbers of infections and the true allele frequencies simultaneously and produced accurate estimates of the mean number of infections.
生活在疟疾流行地区的人通常同时感染多种疟疾。高分辨率基因分型可以通过检测多态性位点上不同等位基因的存在来区分感染。然而,由于来自多种感染的寄生虫可能携带相同的等位基因,因此可能无法准确计数感染的数量。我们使用模拟来确定观察到的基因型数量可能明显少于存在的感染数量的情况,并研究两种从高分辨率基因分型数据估计感染数量的方法的性能。模拟表明,在大多数数据集下,问题并不严重:当有 20 个或更多等位基因、20 个或更多血液样本、感染平均数为 6 或更少且最常见等位基因的频率不超过 20%时,感染平均数和观察到的基因型平均数之间的差异很小。携带相同等位基因的多重感染问题不太可能是 PCR 基因分型误差的主要组成部分。模拟还表明,在等位基因频率存在异质性的情况下,观察到的频率并不是真实等位基因频率的良好近似值。我们提出的第一种估计感染数量的方法假设它们是一个很好的近似值,因此在存在异质性的情况下表现不佳。相比之下,Li 等人提出的第二种方法同时估计了感染的数量和真实的等位基因频率,并对感染的平均数进行了准确的估计。