Bioinformatics. 2018 Mar 1;34(5):725-731. doi: 10.1093/bioinformatics/btx675.
Sequencing of human genomes is now routine, and assembly of shotgun reads is increasingly feasible. However, assemblies often fail to inform about chromosome-scale structure due to a lack of linkage information over long stretches of DNA-a shortcoming that is being addressed by new sequencing protocols, such as the GemCode and Chromium linked reads from 10 × Genomics.
Here, we present ARCS, an application that utilizes the barcoding information contained in linked reads to further organize draft genomes into highly contiguous assemblies. We show how the contiguity of an ABySS H.sapiens genome assembly can be increased over six-fold, using moderate coverage (25-fold) Chromium data. We expect ARCS to have broad utility in harnessing the barcoding information contained in linked read data for connecting high-quality sequences in genome assembly drafts.
https://github.com/bcgsc/ARCS/.
Supplementary data are available at Bioinformatics online.
人类基因组测序现在已成为常规操作,并且越来越可行的是对霰弹枪读取进行组装。然而,由于缺乏长片段 DNA 的连锁信息,组装往往无法提供染色体规模的结构信息——这一缺点正在通过新的测序技术来解决,例如 10x Genomics 的 GemCode 和 Chromium 连接读取。
在这里,我们展示了 ARCS,这是一种利用连接读取中包含的条形码信息将草案基因组进一步组织成高度连续组装的应用程序。我们展示了如何使用中等覆盖率(25 倍)的 Chromium 数据将 ABySS H.sapiens 基因组组装的连续性提高六倍以上。我们期望 ARCS 在利用连接读取数据中包含的条形码信息来连接基因组组装草案中的高质量序列方面具有广泛的应用。
https://github.com/bcgsc/ARCS/。
补充数据可在《生物信息学》在线获取。