使用 splitcode 灵活解析、解释和编辑技术序列。

Flexible parsing, interpretation, and editing of technical sequences with splitcode.

机构信息

UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA 90095, United States.

Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, United States.

出版信息

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae331.

DOI:10.1093/bioinformatics/btae331

PMID:38876979

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11193061/

Abstract

MOTIVATION

Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment, they must be processed and analyzed.

RESULTS

We present a tool called splitcode, that enables flexible and efficient parsing, interpreting, and editing of sequencing reads. This versatile tool facilitates simple, reproducible preprocessing of reads from libraries constructed for a large array of single-cell and bulk sequencing assays.

AVAILABILITY AND IMPLEMENTATION

The splitcode program is available at http://github.com/pachterlab/splitcode.

摘要

动机

下一代测序文库是通过许多合成结构构建的，例如测序接头、条形码和独特的分子标识符。这些序列对于解释测序实验的结果至关重要，当它们包含与实验相关的信息时，就必须对其进行处理和分析。

结果

我们提出了一种名为 splitcode 的工具，它可以灵活高效地解析、解释和编辑测序reads。这个多功能的工具简化了从大量单细胞和批量测序实验构建的文库中进行reads 的预处理，具有良好的可重复性。

可用性和实施

splitcode 程序可在 http://github.com/pachterlab/splitcode 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6192/11193061/fe72b0226435/btae331f1.jpg

相似文献

Flexible parsing, interpretation, and editing of technical sequences with splitcode.使用 splitcode 灵活解析、解释和编辑技术序列。

Bioinformatics. 2024 Jun 3;40(6). doi: 10.1093/bioinformatics/btae331.

Flexible parsing, interpretation, and editing of technical sequences with splitcode.使用分割码对技术序列进行灵活解析、解释和编辑。

bioRxiv. 2023 Dec 9:2023.03.20.533521. doi: 10.1101/2023.03.20.533521.

TagDust2: a generic method to extract reads from sequencing data.TagDust2：一种从测序数据中提取读数的通用方法。

BMC Bioinformatics. 2015 Jan 28;16:24. doi: 10.1186/s12859-015-0454-y.

Btrim: a fast, lightweight adapter and quality trimming program for next-generation sequencing technologies.Btrim：一种快速、轻量级的适用于新一代测序技术的接头和质量修剪程序。

Genomics. 2011 Aug;98(2):152-3. doi: 10.1016/j.ygeno.2011.05.009. Epub 2011 May 30.

Pacybara: accurate long-read sequencing for barcoded mutagenized allelic libraries.Pacybara：用于带条码诱变等位基因文库的准确长读测序。

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae182.

Flexbar 3.0 - SIMD and multicore parallelization.Flexbar 3.0 - SIMD 和多核并行化。

Bioinformatics. 2017 Sep 15;33(18):2941-2942. doi: 10.1093/bioinformatics/btx330.

AmpUMI: design and analysis of unique molecular identifiers for deep amplicon sequencing.AmpUMI：用于深度扩增子测序的独特分子标识符的设计与分析。

Bioinformatics. 2018 Jul 1;34(13):i202-i210. doi: 10.1093/bioinformatics/bty264.

Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers.Je是一个多功能套件，用于处理带有独特分子标识符的多重NGS文库。

BMC Bioinformatics. 2016 Oct 8;17(1):419. doi: 10.1186/s12859-016-1284-2.

NxTrim: optimized trimming of Illumina mate pair reads.NxTrim：优化的 Illumina 配对读取修剪。

Bioinformatics. 2015 Jun 15;31(12):2035-7. doi: 10.1093/bioinformatics/btv057. Epub 2015 Feb 5.

Insertion and deletion correcting DNA barcodes based on watermarks.基于水印的插入和缺失校正DNA条形码

BMC Bioinformatics. 2015 Feb 18;16:50. doi: 10.1186/s12859-015-0482-7.

引用本文的文献

Synthbar: A Lightweight Tool for Adding Synthetic Barcodes to Sequencing Reads.合成条形码生成器：一种用于向测序读数添加合成条形码的轻量级工具。

bioRxiv. 2025 Jun 2:2025.05.30.657070. doi: 10.1101/2025.05.30.657070.

Accurate quantification of nascent and mature RNAs from single-cell and single-nucleus RNA-seq.从单细胞和单核RNA测序中对新生RNA和成熟RNA进行准确定量。

Nucleic Acids Res. 2025 Jan 7;53(1). doi: 10.1093/nar/gkae1137.

Long-read sequencing transcriptome quantification with lr-kallisto.使用lr-kallisto进行长读长测序转录组定量分析。

bioRxiv. 2025 Jan 29:2024.07.19.604364. doi: 10.1101/2024.07.19.604364.

Flexiplex: a versatile demultiplexer and search tool for omics data.Flexiplex：一种通用的组学数据解复用器和搜索工具。

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae102.

kallisto, bustools, and kb-python for quantifying bulk, single-cell, and single-nucleus RNA-seq.用于定量分析批量、单细胞和单细胞核RNA测序的kallisto、bustools和kb-python。

bioRxiv. 2024 Jan 23:2023.11.21.568164. doi: 10.1101/2023.11.21.568164.

simpleaf: a simple, flexible, and scalable framework for single-cell data processing using alevin-fry.simpleaf：一个使用 alevin-fry 进行单细胞数据处理的简单、灵活和可扩展的框架。

Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad614.

simpleaf: A simple, flexible, and scalable framework for single-cell transcriptomics data processing using alevin-fry.simpleaf：一个使用alevin-fry进行单细胞转录组学数据处理的简单、灵活且可扩展的框架。

bioRxiv. 2023 Mar 29:2023.03.28.534653. doi: 10.1101/2023.03.28.534653.

本文引用的文献

A machine-readable specification for genomics assays.基因组学检测的机器可读规范

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae168.

Flexiplex: a versatile demultiplexer and search tool for omics data.Flexiplex：一种通用的组学数据解复用器和搜索工具。

Bioinformatics. 2024 Mar 4;40(3). doi: 10.1093/bioinformatics/btae102.

Best Practices in Designing, Sequencing, and Identifying Random DNA Barcodes.设计、测序和鉴定随机 DNA 条码的最佳实践。

J Mol Evol. 2023 Jun;91(3):263-280. doi: 10.1007/s00239-022-10083-z. Epub 2023 Jan 18.

A universal sequencing read interpreter.通用测序读码器。

Sci Adv. 2023 Jan 4;9(1):eadd2793. doi: 10.1126/sciadv.add2793.

A flexible cross-platform single-cell data processing pipeline.一个灵活的跨平台单细胞数据处理管道。

Nat Commun. 2022 Nov 11;13(1):6847. doi: 10.1038/s41467-022-34681-z.

SPRITE: a genome-wide method for mapping higher-order 3D interactions in the nucleus using combinatorial split-and-pool barcoding.SPRITE：一种使用组合式拆分与混合条形码技术在细胞核中绘制高阶三维相互作用图谱的全基因组方法。

Nat Protoc. 2022 Jan;17(1):36-75. doi: 10.1038/s41596-021-00633-y. Epub 2022 Jan 10.

Mapping and modeling the genomic basis of differential RNA isoform expression at single-cell resolution with LR-Split-seq.利用 LR-Split-seq 技术在单细胞分辨率下绘制和建模差异 RNA 亚型表达的基因组基础。

Genome Biol. 2021 Oct 7;22(1):286. doi: 10.1186/s13059-021-02505-w.

Single-cell RNA counting at allele and isoform resolution using Smart-seq3.基于 Smart-seq3 技术进行等位基因和异构体分辨率的单细胞 RNA 计数

Nat Biotechnol. 2020 Jun;38(6):708-714. doi: 10.1038/s41587-020-0497-0. Epub 2020 May 4.

Fuzzysplit: demultiplexing and trimming sequenced DNA with a declarative language.Fuzzysplit：使用声明性语言对测序DNA进行解复用和修剪

PeerJ. 2019 Jun 19;7:e7170. doi: 10.7717/peerj.7170. eCollection 2019.

The barcode, UMI, set format and BUStools.条码、UMI、设定格式和 BUStools。

Bioinformatics. 2019 Nov 1;35(21):4472-4473. doi: 10.1093/bioinformatics/btz279.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用 splitcode 灵活解析、解释和编辑技术序列。

Flexible parsing, interpretation, and editing of technical sequences with splitcode.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

动机

结果

可用性和实施

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献