Michael Smith Genome Sciences Centre, BC Cancer Agency.
Department of Pathology and Laboratory Medicine, Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada.
Bioinformatics. 2017 Sep 1;33(17):2737-2739. doi: 10.1093/bioinformatics/btx281.
Massively parallel sequencing is now widely used, but data interpretation is only as good as the reference assembly to which it is aligned. While the number of reference assemblies has rapidly expanded, most of these remain at intermediate stages of completion, either as scaffold builds, or as chromosome builds (consisting of correctly ordered, but not necessarily correctly oriented scaffolds separated by gaps). Completion of de novo assemblies remains difficult, as regions that are repetitive or hard to sequence prevent the accumulation of larger scaffolds, and create errors such as misorientations and mislocalizations. Thus, complementary methods for determining the orientation and positioning of fragments are important for finishing assemblies. Strand-seq is a method for determining template strand inheritance in single cells, information that can be used to determine relative genomic distance and orientation between scaffolds, and find errors within them. We present contiBAIT, an R/Bioconductor package which uses Strand-seq data to repair and improve existing assemblies.
contiBAIT is available on Bioconductor. Source files available from GitHub.
koneill@bcgsc.ca or mark.hills@stemcell.com.
Supplementary data are available at Bioinformatics online.
大规模平行测序现在已经得到广泛应用,但数据解释的质量取决于与之对齐的参考组装。虽然参考组装的数量迅速增加,但大多数参考组装仍处于中间完成阶段,要么是支架构建,要么是染色体构建(由正确排序但不一定正确定向的支架组成,由间隙隔开)。从头组装的完成仍然很困难,因为重复或难以测序的区域阻止了更大支架的积累,并导致错误,如定向错误和定位错误。因此,确定片段方向和位置的补充方法对于完成组装非常重要。Strand-seq 是一种在单细胞中确定模板链遗传的方法,该信息可用于确定支架之间的相对基因组距离和方向,并在其中找到错误。我们提出了 contiBAIT,这是一个 R/Bioconductor 包,它使用 Strand-seq 数据来修复和改进现有的组装。
contiBAIT 可在 Bioconductor 上使用。可从 GitHub 获取源文件。
koneill@bcgsc.ca 或 mark.hills@stemcell.com。
补充数据可在 Bioinformatics 在线获得。