Hampson Steve, McLysaght Aoife, Gaut Brandon, Baldi Pierre
Institute for Genomics and Bioinformatics, Department of Information and Computer Science and Department of Ecology and Evolutionary Biology, and Department of Biological Chemistry, University of California at Irvine, Irvine, California 92697, USA.
Genome Res. 2003 May;13(5):999-1010. doi: 10.1101/gr.814403. Epub 2003 Apr 14.
The identification of homologous regions between chromosomes forms the basis for studies of genome organization, comparative genomics, and evolutionary genomics. Identification of these regions can be based on either synteny or colinearity, but there are few methods to test statistically for significant evidence of homology. In the present study, we improve a preexisting method that used colinearity as the basis for statistical tests. Improvements include computational efficiency and a relaxation of the colinearity assumption. Two algorithms perform the method: FullPermutation, which searches exhaustively for runs of markers, and FastRuns, which trades faster run times for exhaustive searches. The algorithms described here are available in the LineUp package (http://www.igb.uci.edu/ approximately baldig/lineup). We explore the performance of both algorithms on simulated data and also on genetic map data from maize (Zea mays ssp. mays). The method has reasonable power to detect a homologous region; for example, in >90% of simulations, both algorithms detect a homologous region of 10 markers buried in a random background, even when the homologous regions have diverged by numerous inversion events. The methods were applied to four maize molecular maps. All maps indicate that the maize genome contains extensive regions of genomic duplication and multiplication. Nonetheless, maps differ substantially in the location of homologous regions, probably reflecting the incomplete nature of genetic map data. The variation among maps has important implications for evolutionary inference from genetic map data.
染色体间同源区域的识别是基因组组织、比较基因组学和进化基因组学研究的基础。这些区域的识别可以基于共线性或同线性,但很少有方法能从统计学上检验同源性的显著证据。在本研究中,我们改进了一种先前存在的方法,该方法以同线性为统计检验的基础。改进之处包括计算效率和对同线性假设的放宽。有两种算法可执行该方法:FullPermutation,它会详尽搜索标记的连续排列;以及FastRuns,它以更快的运行时间换取详尽搜索。这里描述的算法可在LineUp软件包(http://www.igb.uci.edu/ approximately baldig/lineup)中获取。我们在模拟数据以及玉米(Zea mays ssp. mays)的遗传图谱数据上探索了这两种算法的性能。该方法具有合理的检测同源区域的能力;例如,在超过90%的模拟中,即使同源区域因多次倒位事件而发生了分化,两种算法都能检测到埋在随机背景中的10个标记的同源区域。这些方法被应用于四张玉米分子图谱。所有图谱都表明玉米基因组包含广泛的基因组重复和倍增区域。尽管如此,图谱在同源区域的位置上有很大差异,这可能反映了遗传图谱数据的不完整性。图谱之间的差异对于从遗传图谱数据进行进化推断具有重要意义。