Trull Austyn, Worthey Elizabeth A, Ianov Lara
Institutional Research Core Program-Biological Data Science Core, University of Alabama at Birmingham, Birmingham, AL, 35233, United States.
Department of Genetics, University of Alabama at Birmingham, Birmingham, AL, 35233, United States.
Bioinformatics. 2025 Sep 1;41(9). doi: 10.1093/bioinformatics/btaf487.
Recent advancements in long-read single-cell RNA sequencing (scRNA-seq) have facilitated the quantification of full-length transcripts and isoforms at the single-cell level. Historically, long-read data would need to be complemented with short-read single-cell data in order to overcome the higher sequencing errors to correctly identify cellular barcodes and unique molecular identifiers. Improvements in Oxford Nanopore sequencing, and development of novel computational methods have removed this requirement. Though these methods now exist, the limited availability of modular and portable workflows remains a challenge.
Here, we present, nf-core/scnanoseq, a secondary analysis pipeline for long-read single-cell and single-nuclei RNA that delivers gene and transcript-level quantification. The scnanoseq pipeline is implemented using Nextflow and is built upon the nf-core framework, enabling portability across computational environments, scalability and reproducibility of results across pipeline runs. The nf-core/scnanoseq workflow follows best practices for analyzing single-cell and single-nuclei data, performing barcode detection and correction, genome and transcriptome read alignment, unique molecular identifier deduplication, gene and transcript quantification, and extensive quality control reporting.
The source code, and detailed documentation are freely available at https://github.com/nf-core/scnanoseq and https://nf-co.re/scnanoseq under the MIT License. Documentation for the version of nf-core/scnanoseq used for this paper, including default parameters and descriptions of output files are available at https://nf-co.re/scnanoseq/1.1.0.
长读长单细胞RNA测序(scRNA-seq)的最新进展推动了在单细胞水平上对全长转录本和异构体进行定量分析。从历史上看,长读长数据需要与短读长单细胞数据相结合,以克服较高的测序错误率,从而正确识别细胞条形码和独特分子标识符。牛津纳米孔测序技术的改进以及新型计算方法的开发消除了这一需求。尽管现在已有这些方法,但模块化和便携式工作流程的可用性有限仍然是一个挑战。
在此,我们展示了nf-core/scnanoseq,这是一个用于长读长单细胞和单细胞核RNA的二级分析流程,可实现基因和转录本水平的定量分析。scnanoseq流程使用Nextflow实现,并基于nf-core框架构建,能够在不同计算环境中实现可移植性,以及在不同流程运行中实现结果的可扩展性和可重复性。nf-core/scnanoseq工作流程遵循分析单细胞和单细胞核数据的最佳实践,执行条形码检测和校正、基因组和转录组读段比对、独特分子标识符去重、基因和转录本定量分析以及广泛的质量控制报告。