Suppr超能文献

基因组精确比对的参数。

Parameters for accurate genome alignment.

机构信息

Computational Biology Research Center, Institute for Advanced Industrial Science and Technology, Tokyo 135-0064, Japan.

出版信息

BMC Bioinformatics. 2010 Feb 9;11:80. doi: 10.1186/1471-2105-11-80.

Abstract

BACKGROUND

Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.

RESULTS

We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that gamma-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.

CONCLUSIONS

These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours http://last.cbrc.jp/.

摘要

背景

基因组序列比对是许多研究的基础。基因组比对取决于各种平凡但关键的选择,例如如何屏蔽重复序列以及使用哪些评分参数。令人惊讶的是,还没有使用真实基因组数据对这些选择进行大规模评估。此外,也没有采用严格的程序来控制错误比对的速率。

结果

我们评估了 495 种动物、植物和真菌基因组比对的评分参数组合。作为准确性的黄金标准,我们使用蛋白质和结构 RNA 多重比对暗示的基因组比对。我们发现 UCSC 基因组数据库中基于 HOXD 评分方案的比对远非最佳,并提出了更好的参数。较高的 X 丢弃参数值并不总是更好。E 值准确地表示错误比对的速率,但只有在以非标准方式屏蔽串联重复序列时才如此。最后,我们表明伽马质心(概率)比对可以找到高度可靠的对齐碱基子集。

结论

这些结果使基因组比对更加准确,并提供了局部比对和单个对齐碱基的可靠性度量。本研究得益于我们的新软件 LAST,它可以在几个小时内对齐脊椎动物基因组。http://last.cbrc.jp/。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c87/2829014/a31d4f25260c/1471-2105-11-80-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验