Parida Laxmi
Computational Biology Center, IBM T J Watson Research, Yorktown, New York, USA.
J Comput Biol. 2010 Oct;17(10):1345-70. doi: 10.1089/cmb.2009.0243.
We present a random graphs framework to study pedigree history in an ideal (Wright Fisher) population. This framework correlates the underlying mathematical objects in, for example, pedigree graph, mtDNA or NRY Chr tree, ARG (Ancestral Recombinations Graph), and HUD used in literature, into a single unified random graph framework. It also gives a natural definition, based solely on the topology, of an ARG, one of the most interesting as well as useful mathematical objects in this area. The random graphs framework gives an alternative parametrization of the ARG that does not use the recombination rate q and instead uses a parameter M based on the (estimate of ) the number of non-mixing segments in the extant units. This seems more natural in a setting that attempts to tease apart the population dynamics from the biology of the units. This framework also gives a purely topological definition of GMRCA, analogous to MRCA on trees (which has a purely topological description i.e., it is a root, graph-theoretically speaking, of a tree). Secondly, with a natural extension of the ideas from random-graphs we present a sampling (simulation) algorithm to construct random instances of ARG/unilinear transmission graph. This is the first (to the best of the author's knowledge) algorithm that guarantees uniform sampling of the space of ARG instances, reflecting the ideal population model. Finally, using a measure of reconstructability of the past historical events given a collection of extant sequences, we conclude for a given set of extant sequences, the joint history of local segments along a chromosome is reconstructible.
我们提出了一个随机图框架,用于研究理想(赖特-费希尔)种群中的谱系历史。该框架将文献中使用的例如谱系图、线粒体DNA或Y染色体非重组区树、祖先重组图(ARG)和家系图等潜在数学对象关联到一个统一的随机图框架中。它还仅基于拓扑结构给出了ARG的自然定义,ARG是该领域中最有趣且有用的数学对象之一。随机图框架给出了ARG的另一种参数化方式,该方式不使用重组率q,而是使用基于现存单元中非混合片段数量(估计值)的参数M。在试图将种群动态与单元生物学区分开来的背景下,这似乎更为自然。该框架还给出了最近共同祖先(GMRCA)的纯拓扑定义,类似于树上的最近共同祖先(MRCA)(它有一个纯拓扑描述,即从图论角度讲,它是树的根)。其次,通过对随机图思想的自然扩展,我们提出了一种采样(模拟)算法,用于构建ARG/单线性传递图的随机实例。据作者所知,这是第一种保证对ARG实例空间进行均匀采样的算法,反映了理想种群模型。最后,利用给定现存序列集合对过去历史事件的可重构性度量,我们得出对于给定的一组现存序列,沿着染色体的局部片段的联合历史是可重构的。