Suppr超能文献

Kollector:基于转录本信息的基因座靶向从头组装。

Kollector: transcript-informed, targeted de novo assembly of gene loci.

作者信息

Kucuk Erdi, Chu Justin, Vandervalk Benjamin P, Hammond S Austin, Warren René L, Birol Inanc

机构信息

University of British Columbia, Vancouver, BC, Canada.

Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada.

出版信息

Bioinformatics. 2017 Jun 15;33(12):1782-1788. doi: 10.1093/bioinformatics/btx078.

Abstract

MOTIVATION

Despite considerable advancements in sequencing and computing technologies, de novo assembly of whole eukaryotic genomes is still a time-consuming task that requires a significant amount of computational resources and expertise. A targeted assembly approach to perform local assembly of sequences of interest remains a valuable option for some applications. This is especially true for gene-centric assemblies, whose resulting sequence can be readily utilized for more focused biological research. Here we describe Kollector, an alignment-free targeted assembly pipeline that uses thousands of transcript sequences concurrently to inform the localized assembly of corresponding gene loci. Kollector robustly reconstructs introns and novel sequences within these loci, and scales well to large genomes-properties that makes it especially useful for researchers working on non-model eukaryotic organisms.

RESULTS

We demonstrate the performance of Kollector for assembling complete or near-complete Caenorhabditis elegans and Homo sapiens gene loci from their respective, input transcripts. In a time- and memory-efficient manner, the Kollector pipeline successfully reconstructs respectively 99% and 80% (compared to 86% and 73% with standard de novo assembly techniques) of C.elegans and H.sapiens transcript targets in their corresponding genomic space using whole genome shotgun sequencing reads. We also show that Kollector outperforms both established and recently released targeted assembly tools. Finally, we demonstrate three use cases for Kollector, including comparative and cancer genomics applications.

AVAILABILITY AND IMPLEMENTATION

Kollector is implemented as a bash script, and is available at https://github.com/bcgsc/kollector.

CONTACT

ibirol@bcgsc.ca.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

尽管测序和计算技术取得了显著进展,但真核生物全基因组的从头组装仍然是一项耗时的任务,需要大量的计算资源和专业知识。对于某些应用而言,采用靶向组装方法对感兴趣的序列进行局部组装仍是一种有价值的选择。对于以基因为中心的组装尤其如此,其产生的序列可很容易地用于更具针对性的生物学研究。在此,我们描述了Kollector,这是一种无比对靶向组装流程,它同时使用数千个转录本序列来指导相应基因座的局部组装。Kollector能够可靠地重建这些基因座内的内含子和新序列,并且能够很好地扩展到大型基因组——这些特性使其对研究非模式真核生物的研究人员特别有用。

结果

我们展示了Kollector从各自的输入转录本中组装完整或接近完整的秀丽隐杆线虫和人类基因座的性能。以高效利用时间和内存的方式,Kollector流程使用全基因组鸟枪法测序读数,在相应的基因组空间中分别成功重建了秀丽隐杆线虫和人类转录本靶标的99%和80%(与标准从头组装技术的86%和73%相比)。我们还表明,Kollector优于已有的和最近发布的靶向组装工具。最后,我们展示了Kollector的三个用例,包括比较基因组学和癌症基因组学应用。

可用性和实现方式

Kollector作为一个bash脚本实现,可在https://github.com/bcgsc/kollector获取。

联系方式

ibirol@bcgsc.ca

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/002f/5572715/082d2a20daed/btx078f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验