Sargsyan Ori
, Los Alamos, NM, 87544, USA,
J Math Biol. 2015 Mar;70(4):913-56. doi: 10.1007/s00285-014-0785-8. Epub 2014 Apr 24.
This paper presents an analytical framework for analyzing polymorphisms created by two mutation events in samples of DNA sequences modeled in the general coalescent tree setting. I developed the framework by deriving analytical formulas for the numbers of the topologies of the genealogies with two mutation events. This approach gives an advantage to analyze polymorphisms in large samples of DNA sequences at a non-recombining locus under vicarious evolutionary scenarios. Particularly the framework allows to estimate the probability of polymorphism data created by two mutation events as well as the ages of the events. Based on these results I extended the definition of the site frequency spectrum by classifying pairs of polymorphic sites into groups and presented analytical expressions for computing the expected sizes of these groups. Within the framework I also designed a Bayesian approach for inferring the haplotype of the most recent common ancestor at two polymorphic sites. Lastly, the framework was applied to polymorphism data from human APOE gene region under various demographic scenarios for ancestral human population and explored the signature of linkage disequilibrium for inferring the ancestral haplotype at two polymorphic sites. Interestingly enough, the results show that the most frequent haplotype at two completely linked polymorphic sites is not always the most likely candidate for the haplotype of the most recent common ancestor.
本文提出了一个分析框架,用于分析在一般合并树设置下建模的DNA序列样本中由两个突变事件产生的多态性。我通过推导具有两个突变事件的谱系拓扑数量的解析公式来开发该框架。这种方法有利于在替代进化场景下分析非重组位点处大样本DNA序列中的多态性。特别是,该框架允许估计由两个突变事件产生的多态性数据的概率以及事件的发生时间。基于这些结果,我通过将多态性位点对分类为不同组来扩展位点频率谱的定义,并给出了计算这些组预期大小的解析表达式。在该框架内,我还设计了一种贝叶斯方法,用于推断两个多态性位点处最近共同祖先的单倍型。最后,该框架被应用于不同人类祖先群体人口统计场景下的人类APOE基因区域的多态性数据,并探索了连锁不平衡特征以推断两个多态性位点处的祖先单倍型。有趣的是,结果表明,在两个完全连锁的多态性位点处最常见的单倍型并不总是最近共同祖先单倍型的最可能候选者。