Suppr超能文献

支架填充、重叠群融合和比较基因顺序推断。

Scaffold filling, contig fusion and comparative gene order inference.

机构信息

Department of Mathematics and Statistics, University of Ottawa, Ottawa, K1N 6N5, Canada.

出版信息

BMC Bioinformatics. 2010 Jun 4;11:304. doi: 10.1186/1471-2105-11-304.

Abstract

BACKGROUND

There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes?

RESULTS

Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera.

CONCLUSIONS

The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.

摘要

背景

在尚未完成基因组测序的情况下,对基因组测序的系统发育范围进行扩展已成为一种趋势。越来越多的基因组以支架或连续体的形式发表。然而,重排算法,包括基于基因顺序的系统发育工具,需要整个基因组的基因顺序或同线性块顺序数据。那么,我们如何仅使用支架形式的基因组来使用重排算法呢?比较证据能否预测未测序基因的位置?

结果

我们的方法涉及从支架中最优地填补缺失的基因,同时将扩充的支架直接作为染色体纳入重排算法中。这是通过一个精确的、多项式时间算法实现的。然后,我们纠正了为使支架与完整组装具有可比性而需要额外的融合/裂变操作的数量。我们通过模拟和比较被子植物基因组 Ricinus communis 和 Vitis vinifera 来构建模型,研究缺失基因的比例与支架填充后基因组距离增加之间的关系。我们通过模拟和比较被子植物基因组 Ricinus communis 和 Vitis vinifera 来估计该模型的参数。

结论

该算法在 MacBook 上不到一分钟即可解决具有 18300 个基因的基因组的比较问题,包括一个基因组中缺失的 4500 个基因,几乎所有基因组都在该方法的范围内。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验