Biostatistics and Bioinformatics, Duke University Medical Center, Durham, NC, USA.
Duke Cancer Institute, Duke University Medical Center, Durham, NC, USA.
Bioinformatics. 2018 Oct 15;34(20):3581-3583. doi: 10.1093/bioinformatics/bty402.
CRISPR-Cas9 and shRNA high-throughput sequencing screens have abundant applications for basic and translational research. Methods and tools for the analysis of these screens must properly account for sequencing error, resolve ambiguous mappings among similar sequences in the barcode library in a statistically principled manner, and be computationally efficient. Herein we present bcSeq, an open source R package that implements a fast and parallelized algorithm for mapping high-throughput sequencing reads to a barcode library while tolerating sequencing error. The algorithm uses a Trie data structure for speed and resolves ambiguous mappings by using a statistical sequencing error model based on Phred scores for each read.
The package source code and an accompanying tutorial are available at http://bioconductor.org/packages/bcSeq/.
Supplementary data are available at Bioinformatics online.
CRISPR-Cas9 和 shRNA 高通量测序筛选在基础研究和转化研究中有广泛的应用。分析这些筛选的方法和工具必须正确考虑测序错误,以统计上合理的方式解决条形码库中相似序列之间的模糊映射,并具有计算效率。本文介绍了 bcSeq,这是一个开源的 R 包,它实现了一种快速并行的算法,用于将高通量测序读取映射到条形码库,同时容忍测序错误。该算法使用 Trie 数据结构来提高速度,并使用基于每个读取的 Phred 分数的统计测序错误模型来解决模糊映射。
软件包的源代码和一个附带的教程可在 http://bioconductor.org/packages/bcSeq/ 获得。
补充数据可在 Bioinformatics 在线获得。