Suppr超能文献

BARCOSEL:一种用于为高通量测序选择最佳条码集的工具。

BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing.

机构信息

Mathematical Biology Group, Department of Biosciences, FIN-00014 University of Helsinki, P.O.Box 65, Finland.

Holm Group, Institute of Biotechnology, FIN-00014 University of Helsinki, P.O.Box 56, Finland.

出版信息

BMC Bioinformatics. 2018 Jul 5;19(1):257. doi: 10.1186/s12859-018-2262-7.

Abstract

BACKGROUND

Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to be sequenced simultaneously. After sequencing, the samples are binned by identifying the barcode sequence within each sequence read. In order to tolerate sequencing errors, barcodes should be sufficiently apart from each other in sequence space. An additional constraint due to both nucleotide usage and basecalling accuracy is that the proportion of different nucleotides should be in balance in each barcode position. The number of samples to be mixed in each sequencing run may vary and this introduces a problem how to select the best subset of available barcodes at sequencing core facility for each sequencing run. There are plenty of tools available for de novo barcode design, but they are not suitable for subset selection.

RESULTS

We have developed a tool which can be used for three different tasks: 1) selecting an optimal barcode set from a larger set of candidates, 2) checking the compatibility of user-defined set of barcodes, e.g. whether two or more libraries with existing barcodes can be combined in a single sequencing pool, and 3) augmenting an existing set of barcodes. In our approach the selection process is formulated as a minimization problem. We define the cost function and a set of constraints and use integer programming to solve the resulting combinatorial problem. Based on the desired number of barcodes to be selected and the set of candidate sequences given by user, the necessary constraints are automatically generated and the optimal solution can be found. The method is implemented in C programming language and web interface is available at http://ekhidna2.biocenter.helsinki.fi/barcosel .

CONCLUSIONS

Increasing capacity of sequencing platforms raises the challenge of mixing barcodes. Our method allows the user to select a given number of barcodes among the larger existing barcode set so that both sequencing errors are tolerated and the nucleotide balance is optimized. The tool is easy to access via web browser.

摘要

背景

目前的高通量测序平台能够实现多个样本的并行测序。在将它们混合到包含多个要同时测序的文库的混合物中之前,通过将短的样本特异性核苷酸序列(条形码)附加到每个 DNA 分子上来对不同的样本进行标记。在测序后,通过识别每个序列读取中的条形码序列来对样本进行分类。为了容忍测序错误,条形码在序列空间中应彼此足够分开。由于核苷酸使用和碱基调用准确性的原因,另一个额外的约束条件是每个条形码位置的不同核苷酸的比例应保持平衡。每个测序运行中要混合的样本数量可能会有所不同,这就提出了一个问题,即在测序核心设施中,如何为每个测序运行选择最佳的可用条形码子集。有很多工具可用于从头设计条形码,但它们不适合子集选择。

结果

我们开发了一种工具,可用于完成以下三种不同的任务:1)从较大的候选条形码集中选择最佳的条形码集,2)检查用户定义的条形码集的兼容性,例如两个或更多带有现有条形码的文库是否可以组合在一个测序池中,3)扩展现有的条形码集。在我们的方法中,选择过程被公式化为最小化问题。我们定义了成本函数和一组约束条件,并使用整数规划来解决由此产生的组合问题。根据所需选择的条形码数量和用户提供的候选序列集,自动生成必要的约束条件,并可以找到最佳解决方案。该方法是用 C 编程语言实现的,并且可以通过 http://ekhidna2.biocenter.helsinki.fi/barcosel 访问网络界面。

结论

测序平台容量的增加带来了混合条形码的挑战。我们的方法允许用户在较大的现有条形码集中选择给定数量的条形码,以便既能容忍测序错误,又能优化核苷酸平衡。该工具可以通过网络浏览器轻松访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a21/6034344/449f972bfd81/12859_2018_2262_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验