Suppr超能文献

用于大规模基因组实验的超快速基因组比较。

Ultra-fast genome comparison for large-scale genomic experiments.

机构信息

Computer Architecture Department, University of Málaga - Instituto de Investigación Biomédica de Málaga-IBIMA, Málaga, Spain.

出版信息

Sci Rep. 2019 Jul 16;9(1):10274. doi: 10.1038/s41598-019-46773-w.

Abstract

In the last decade, a technological shift in the bioinformatics field has occurred: larger genomes can now be sequenced quickly and cost effectively, resulting in the computational need to efficiently compare large and abundant sequences. Furthermore, detecting conserved similarities across large collections of genomes remains a problem. The size of chromosomes, along with the substantial amount of noise and number of repeats found in DNA sequences (particularly in mammals and plants), leads to a scenario where executing and waiting for complete outputs is both time and resource consuming. Filtering steps, manual examination and annotation, very long execution times and a high demand for computational resources represent a few of the many difficulties faced in large genome comparisons. In this work, we provide a method designed for comparisons of considerable amounts of very long sequences that employs a heuristic algorithm capable of separating noise and repeats from conserved fragments in pairwise genomic comparisons. We provide software implementation that computes in linear time using one core as a minimum and a small, constant memory footprint. The method produces both a previsualization of the comparison and a collection of indices to drastically reduce computational complexity when performing exhaustive comparisons. Last, the method scores the comparison to automate classification of sequences and produces a list of detected synteny blocks to enable new evolutionary studies.

摘要

在过去的十年中,生物信息学领域发生了一场技术变革:现在可以快速且经济高效地对更大的基因组进行测序,从而产生了高效比较大量丰富序列的计算需求。此外,在大量基因组集合中检测保守相似性仍然是一个问题。染色体的大小,以及 DNA 序列中发现的大量噪声和重复数量(尤其是在哺乳动物和植物中),导致执行和等待完整输出既耗时又耗资源。过滤步骤、手动检查和注释、非常长的执行时间以及对计算资源的高需求,这些只是在大型基因组比较中面临的众多困难中的一部分。在这项工作中,我们提供了一种针对大量非常长序列的比较方法,该方法采用启发式算法,能够在两两基因组比较中从保守片段中分离噪声和重复。我们提供了软件实现,该实现使用一个核心作为最小资源,并具有较小的、恒定的内存占用,在时间上呈线性计算。该方法生成比较的预可视化以及索引集,以在执行详尽比较时大大降低计算复杂度。最后,该方法对比较进行评分,以实现序列的自动分类,并生成检测到的同线性块列表,从而能够进行新的进化研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a05b/6635410/6f55b8146217/41598_2019_46773_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验