Yi Haisi, Li Zhe, Li Tao, Zhao Jindong
Key Laboratory of Algal Biology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, Hubei 430072, China, University of Chinese Academy of Sciences, Beijing 100049, China.
State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China and.
Bioinformatics. 2015 Dec 15;31(24):4000-2. doi: 10.1093/bioinformatics/btv501. Epub 2015 Aug 26.
Demultiplexing is used after high-throughput sequencing to in silico assign reads to the samples of origin based on the sequenced reads of the indices. Existing demultiplexing tools based on the similarity between the read index and the reference index sequences may fail to provide satisfactory results on low-quality datasets. We developed Bayexer, a Bayesian demultiplexing algorithm for Illumina sequencers. Bayexer uses the information extracted directly from the contaminant sequences of the targeting reads as the training dataset for a naïve Bayes classifier to assign reads. According to our evaluation, Bayexer provides higher capability, accuracy and speed on various real datasets than other tools.
Bayexer is implemented in Perl and freely available at https://github.com/HaisiYi/Bayexer.
高通量测序后使用解复用技术,以便根据索引的测序读数在计算机上把读数分配到原始样本。现有的基于读数索引与参考索引序列之间相似性的解复用工具,在低质量数据集上可能无法提供令人满意的结果。我们开发了Bayexer,这是一种用于Illumina测序仪的贝叶斯解复用算法。Bayexer将直接从靶向读数的污染序列中提取的信息用作朴素贝叶斯分类器的训练数据集来分配读数。根据我们的评估,与其他工具相比,Bayexer在各种真实数据集上具有更高的能力、准确性和速度。
Bayexer用Perl语言实现,可在https://github.com/HaisiYi/Bayexer上免费获取。