Suppr超能文献

用于比较基因组学的通用基因组坐标转换器。

A universal genomic coordinate translator for comparative genomics.

机构信息

Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.

出版信息

BMC Bioinformatics. 2014 Jun 30;15:227. doi: 10.1186/1471-2105-15-227.

Abstract

BACKGROUND

Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N.

RESULTS

Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species.

CONCLUSIONS

Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken.

摘要

背景

基因组重复构成了物种进化的主要事件,使基因的同源拷贝能够承担精细的生物学作用。通过基因座,即基因组序列的保守顺序,可以明确确定多个基因组之间的同源关系。然而,要全面分析重复事件及其对进化的贡献,就需要进行全基因组比对,这会随着可用基因组数量 N 的增加而呈 N2 级增长。

结果

在这里,我们介绍了 Kraken,这是一种软件,它通过递归遍历成对比对的图并动态重新计算同源性来省略全对全的要求。Kraken 的规模与目标基因组的数量 N 呈线性关系,这使得可以在分析中包含大量基因组。我们首先在 12 个果蝇基因组数据集上评估了该方法,发现通过多个基因座图的图递归遍历间接计算出的同源对应关系,在灵敏度方面代价最小,但总体计算运行时间减少了一个数量级。然后,我们在三个注释良好的哺乳动物基因组(人类、小鼠和大鼠)上使用该方法,并表明多达 93%的蛋白质编码转录本在基因组之间具有明确的成对同源关系。在核苷酸水平上,70%至 83%的外显子在两个剪接连接处完全匹配,在至少一个连接处最多可达 97%。最后,我们将 Kraken 应用于来自多种脊椎动物和不同组织的 RNA 测序数据集,在该数据集中,我们证实了大脑特异性基因家族成员(即一对多或多对多的同源物)在物种间的相关性高于单拷贝(即一对一的同源物)基因。Kraken 不仅限于蛋白质编码基因,还可以识别数千个新鉴定的转录基因座,这些基因座可能是非编码 RNA,在人类、黑猩猩和大猩猩中持续转录,并在物种间保持表达水平的显著相关性。

结论

Kraken 是一种计算基因组坐标翻译器,它促进了种间比较,区分了同源物和同源物,并不需要昂贵的全对全全基因组映射。Kraken 可根据 LPGL 从 http://github.com/nedaz/kraken 免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/20f7/4086997/b2ec924b1eac/1471-2105-15-227-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验