Department of Integrative Biology, University of California Berkeley, Berkeley, CA, 94720, USA.
Department of Ecology, Behavior and Evolution, University of California San Diego, La Jolla, CA, 92093, USA.
J Mol Evol. 2023 Jun;91(3):263-280. doi: 10.1007/s00239-022-10083-z. Epub 2023 Jan 18.
Random DNA barcodes are a versatile tool for tracking cell lineages, with applications ranging from development to cancer to evolution. Here, we review and critically evaluate barcode designs as well as methods of barcode sequencing and initial processing of barcode data. We first demonstrate how various barcode design decisions affect data quality and propose a new design that balances all considerations that we are currently aware of. We then discuss various options for the preparation of barcode sequencing libraries, including inline indices and Unique Molecular Identifiers (UMIs). Finally, we test the performance of several established and new bioinformatic pipelines for the extraction of barcodes from raw sequencing reads and for error correction. We find that both alignment and regular expression-based approaches work well for barcode extraction, and that error-correction pipelines designed specifically for barcode data are superior to generic ones. Overall, this review will help researchers to approach their barcoding experiments in a deliberate and systematic way.
随机 DNA 条码是一种用于追踪细胞谱系的多功能工具,其应用范围从发育到癌症到进化。在这里,我们回顾和批判性地评估条码设计以及条码测序和条码数据初始处理的方法。我们首先展示了各种条码设计决策如何影响数据质量,并提出了一种新的设计,平衡了我们目前所了解的所有考虑因素。然后,我们讨论了条码测序文库制备的各种选择,包括内联索引和独特分子标识符 (UMIs)。最后,我们测试了几种已建立和新的生物信息学管道从原始测序reads 中提取条码和纠错的性能。我们发现,基于对齐和正则表达式的方法都可以很好地提取条码,并且专门为条码数据设计的纠错管道优于通用的管道。总的来说,这篇综述将帮助研究人员以深思熟虑和系统的方式进行他们的条码实验。