Li Xiaohong, Foulkes Andrea S, Yucel Recai M, Rich Stephen M
University of Massachusetts, Amherst, MA, USA.
Stat Appl Genet Mol Biol. 2007;6:Article33. doi: 10.2202/1544-6115.1321. Epub 2007 Nov 19.
Characterizing genetic variability in the human pathogenic Plasmodium species, the group of parasites that cause Malaria, may have broad global health implications. Specifically, discerning the combinations of mutations that lead to viable parasites and the population level frequencies of these clonal sequences will allow for targeted vaccine development and individualized treatment choices. This presents an analytical challenge, however, since haplotypic phase (i.e. the alignment of bases on a single DNA strand) is generally unobservable in multiply infected individuals. This manuscript describes an expectation maximization (EM) approach to maximum likelihood estimation of haplotype frequencies in this missing data setting. The approach is applied to a cohort of N=341 malaria infected children in Uganda, Cameroon and Sudan to characterize regional differences. A simulation study is also presented to characterize method performance and assess sensitivity to distributional assumptions.
对导致疟疾的寄生虫——人类致病疟原虫物种中的基因变异性进行特征分析,可能会对全球健康产生广泛影响。具体而言,识别导致存活寄生虫的突变组合以及这些克隆序列在群体水平上的频率,将有助于开展有针对性的疫苗研发和个性化治疗选择。然而,这带来了一个分析挑战,因为在多重感染个体中,单倍型相位(即单条DNA链上碱基的排列)通常是不可观察的。本手稿描述了一种期望最大化(EM)方法,用于在这种缺失数据情况下对单倍型频率进行最大似然估计。该方法应用于乌干达、喀麦隆和苏丹的341名疟疾感染儿童队列,以表征区域差异。还进行了一项模拟研究,以表征方法性能并评估对分布假设的敏感性。