Boni Maciej F, Posada David, Feldman Marcus W
Stanford Genome Technology Center, Palo Alto, California 94304, USA.
Genetics. 2007 Jun;176(2):1035-47. doi: 10.1534/genetics.106.068874. Epub 2007 Apr 3.
Statistical tests for detecting mosaic structure or recombination among nucleotide sequences usually rely on identifying a pattern or a signal that would be unlikely to appear under clonal reproduction. Dozens of such tests have been described, but many are hampered by long running times, confounding of selection and recombination, and/or inability to isolate the mosaic-producing event. We introduce a test that is exact, nonparametric, rapidly computable, free of the infinite-sites assumption, able to distinguish between recombination and variation in mutation/fixation rates, and able to identify the breakpoints and sequences involved in the mosaic-producing event. Our test considers three sequences at a time: two parent sequences that may have recombined, with one or two breakpoints, to form the third sequence (the child sequence). Excess similarity of the child sequence to a candidate recombinant of the parents is a sign of recombination; we take the maximum value of this excess similarity as our test statistic Delta(m,n,b). We present a method for rapidly calculating the distribution of Delta(m,n,b) and demonstrate that it has comparable power to and a much improved running time over previous methods, especially in detecting recombination in large data sets.
用于检测核苷酸序列中的镶嵌结构或重组的统计测试通常依赖于识别在克隆繁殖情况下不太可能出现的模式或信号。已经描述了数十种这样的测试,但许多测试受到运行时间长、选择与重组混淆以及/或者无法分离产生镶嵌结构的事件的阻碍。我们引入了一种精确、非参数、可快速计算、无需无限位点假设、能够区分重组与突变/固定率变化并且能够识别产生镶嵌结构事件中涉及的断点和序列的测试。我们的测试一次考虑三个序列:两个可能已经重组且有一个或两个断点的亲本序列,以形成第三个序列(子代序列)。子代序列与亲本的候选重组体的过度相似是重组的一个迹象;我们将这种过度相似的最大值作为我们的测试统计量Delta(m,n,b)。我们提出了一种快速计算Delta(m,n,b)分布的方法,并证明它与以前的方法相比具有相当的功效且运行时间有了很大改进,特别是在检测大数据集中重组方面。