Soderlund Carol, Nelson William, Shoemaker Austin, Paterson Andrew
Arizona Genomics Computational Laboratory, The Bio5 Institute, University of Arizona, Tucson, Arizona 85721, USA.
Genome Res. 2006 Sep;16(9):1159-68. doi: 10.1101/gr.5396706.
Previous approaches to comparing gene and chromosome organization between two genomes have been based on genetic maps or genomic sequences. We have developed a system to align an FPC-based physical map to a genomic sequence based on BAC end sequences and sequence-tagged hybridization markers and to align two FPC maps to one another based on shared markers and fingerprints. The system, called SyMAP (Synteny Mapping and Analysis Program), consists of an algorithm to compute synteny blocks and Web-based graphics to visualize the results. The approach to calculating the anchors (corresponding elements on the respective maps) maximizes the inclusion of anchors with different rates of divergence. Chains (putative syntenic sets of anchors) are computed using a dynamic programming algorithm, which includes off-diagonal anchors that result from map coordinate errors and small inversions. As the gap parameters (the distances allowed between anchors in a chain) can vary over different data sets and be difficult to set manually, they are automatically computed per data set. The criterion for a chain to be acceptable is based on the number of anchors and the Pearson correlation coefficient. Neighboring chains are merged into synteny blocks for display. This algorithm has been tested with three data sets that vary in the number of BACs, BAC end sequences, hybridization markers, distance between anchors, and number and antiquity of genome duplication events. The Web-based graphics uses Java for a highly interactive display that allows the user to interrogate the evidence of synteny.
以往比较两个基因组间基因和染色体组织的方法是基于遗传图谱或基因组序列。我们开发了一个系统,可将基于FPC的物理图谱与基于BAC末端序列及序列标签杂交标记的基因组序列进行比对,并能基于共享标记和指纹将两个FPC图谱相互比对。该系统名为SyMAP(同线性图谱绘制与分析程序),由一个用于计算同线性区域的算法和基于网络的图形工具组成,用于可视化结果。计算锚定物(各图谱上的对应元件)的方法能最大限度地纳入具有不同分歧率的锚定物。连锁群(假定的锚定物同线性集合)通过动态规划算法计算得出,该算法包括因图谱坐标错误和小倒位产生的非对角线锚定物。由于间隔参数(连锁群中锚定物之间允许的距离)在不同数据集上会有所变化且难以手动设置,所以针对每个数据集自动计算这些参数。连锁群可接受的标准基于锚定物数量和皮尔逊相关系数。相邻的连锁群会合并成同线性区域用于展示。该算法已在三个数据集上进行了测试,这些数据集在BAC数量、BAC末端序列、杂交标记、锚定物间距以及基因组复制事件的数量和古老程度等方面存在差异。基于网络的图形工具使用Java实现高度交互式展示,允许用户探究同线性的证据。