Woerner August E, Cox Murray P, Hammer Michael F
Arizona Research Laboratories-Biotechnology, University of Arizona, Tucson, AZ 85721, USA.
Bioinformatics. 2007 Jul 15;23(14):1851-3. doi: 10.1093/bioinformatics/btm253. Epub 2007 May 22.
With the increasing amount of DNA sequence data available from natural populations, new computational methods are needed to efficiently process raw sequences into formats that are applicable to a variety of analytical methods. One highly successful approach to inferring aspects of demographic history is grounded in coalescent theory. Many of these methods restrict themselves to perfectly tree-like genealogies (i.e. regions with no observed recombination), because theoretical difficulties prevent ready statistical evaluation of recombining regions. However, determining which recombination-filtered dataset to analyze from a larger recombination-rich genomic region is a non-trivial problem. Current applications primarily aim to quantify recombination rates (rather than produce optimal recombination-filtered blocks), require significant manual intervention, and are impractical for multiple genomic datasets in high-throughput, automated research environments. Here, we present a fast, simple and automatable command-line program that extracts optimal recombination-filtered blocks (no four-gamete violations) from recombination-rich genomic re-sequence data.
随着可从自然种群获得的DNA序列数据量不断增加,需要新的计算方法来有效地将原始序列处理成适用于各种分析方法的格式。一种非常成功的推断种群历史方面的方法基于合并理论。这些方法中的许多都将自己限制在完美的树状谱系(即没有观察到重组的区域),因为理论上的困难阻碍了对重组区域进行现成的统计评估。然而,从更大的富含重组的基因组区域中确定要分析哪个经过重组过滤的数据集是一个 nontrivial 问题。当前的应用主要旨在量化重组率(而不是产生最佳的经过重组过滤的片段),需要大量人工干预,并且在高通量、自动化研究环境中对于多个基因组数据集是不切实际的。在这里,我们提出了一个快速、简单且可自动化的命令行程序,该程序可从富含重组的基因组重测序数据中提取最佳的经过重组过滤的片段(无四配子违规)。