Galinsky Kevin, Valim Clarissa, Salmier Arielle, de Thoisy Benoit, Musset Lise, Legrand Eric, Faust Aubrey, Baniecki Mary Lynn, Ndiaye Daouda, Daniels Rachel F, Hartl Daniel L, Sabeti Pardis C, Wirth Dyann F, Volkman Sarah K, Neafsey Daniel E
Department of Biostatistics, Harvard School of Public Health, Boston, MA, 02115, USA.
Department of Immunology and Infectious Disease, Harvard School of Public Health, Boston, MA, 02115, USA.
Malar J. 2015 Jan 19;14:4. doi: 10.1186/1475-2875-14-4.
Complex malaria infections are defined as those containing more than one genetically distinct lineage of Plasmodium parasite. Complexity of infection (COI) is a useful parameter to estimate from patient blood samples because it is associated with clinical outcome, epidemiology and disease transmission rate. This manuscript describes a method for estimating COI using likelihood, called COIL, from a panel of bi-allelic genotyping assays.
COIL assumes that distinct parasite lineages in complex infections are unrelated and that genotyped loci do not exhibit significant linkage disequilibrium. Using the population minor allele frequency (MAF) of the genotyped loci, COIL uses the binomial distribution to estimate the likelihood of a COI level given the prevalence of observed monomorphic or polymorphic genotypes within each sample.
COIL reliably estimates COI up to a level of three or five with at least 24 or 96 unlinked genotyped loci, respectively, as determined by in silico simulation and empirical validation. Evaluation of COI levels greater than five in patient samples may require a very large collection of genotype data, making sequencing a more cost-effective approach for evaluating COI under conditions when disease transmission is extremely high. Performance of the method is positively correlated with the MAF of the genotyped loci. COI estimates from existing SNP genotype datasets create a more detailed portrait of disease than analyses based simply on the number of polymorphic genotypes observed within samples.
The capacity to reliably estimate COI from a genome-wide panel of SNP genotypes provides a potentially more accurate alternative to methods relying on PCR amplification of a small number of loci for estimating COI. This approach will also increase the number of applications of SNP genotype data, providing additional motivation to employ SNP barcodes for studies of disease epidemiology or control measure efficacy. The COIL program is available for download from GitHub, and users may also upload their SNP genotype data to a web interface for simple and efficient determination of sample COI.
复杂疟疾感染被定义为含有一种以上基因不同的疟原虫谱系的感染。感染复杂性(COI)是一个可从患者血样中估算的有用参数,因为它与临床结果、流行病学和疾病传播率相关。本手稿描述了一种使用似然法从一组双等位基因基因分型检测中估算COI的方法,称为COIL。
COIL假定复杂感染中不同的寄生虫谱系是不相关的,并且基因分型位点不存在显著的连锁不平衡。利用基因分型位点的群体次要等位基因频率(MAF),COIL使用二项分布来估算给定每个样本中观察到的单态或多态基因型流行率时的COI水平的似然性。
通过计算机模拟和实证验证确定,COIL分别使用至少24个或96个不连锁的基因分型位点时,能够可靠地估算高达三个或五个水平的COI。评估患者样本中大于五个水平的COI可能需要大量的基因型数据收集,这使得测序成为在疾病传播极高的情况下评估COI的更具成本效益的方法。该方法的性能与基因分型位点的MAF呈正相关。与仅基于样本中观察到的多态基因型数量的分析相比,从现有SNP基因型数据集中估算的COI能更详细地描绘疾病情况。
从全基因组SNP基因型组可靠估算COI的能力为依赖少数位点的PCR扩增来估算COI的方法提供了一种可能更准确的替代方法。这种方法还将增加SNP基因型数据的应用数量,为在疾病流行病学研究或控制措施效果研究中使用SNP条形码提供额外的动力。COIL程序可从GitHub下载,用户也可以将其SNP基因型数据上传到网络界面,以便简单高效地确定样本的COI。