长颈鹿：一种用于多个长读长测序数据综合处理与可视化的工具。

Giraffe: A tool for comprehensive processing and visualization of multiple long-read sequencing data.

作者信息

Liu Xudong, Shao Yanwen, Guo Zhihao, Ni Ying, Sun Xuan, Leung Anskar Yu Hung, Li Runsheng

机构信息

Department of Infectious Diseases and Public Health, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong, China.

Department of Biomedical Sciences, Jockey Club College of Veterinary Medicine and Life Sciences, City University of Hong Kong, Hong Kong, China.

出版信息

Comput Struct Biotechnol J. 2024 Aug 9;23:3241-3246. doi: 10.1016/j.csbj.2024.08.003. eCollection 2024 Dec.

DOI:10.1016/j.csbj.2024.08.003

PMID:39279873

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11393587/

Abstract

Third-generation sequencing techniques have become increasingly popular due to their capacity to produce long, high-quality reads. Effective comparative analysis across various samples and sequencing platforms is essential for understanding biological mechanisms and establishing benchmark baselines. However, existing tools for long-read sequencing predominantly focus on quality control (QC) and processing for individual samples, complicating the comparison of multiple datasets. The lack of comprehensive tools for data comparison and visualization presents challenges for researchers with limited bioinformatics experience. To address this gap, we present Giraffe (https://github.com/lrslab/Giraffe_View), a Python3-based command-line tool designed for comparative analysis and visualization across diverse samples and platforms. Giraffe facilitates the assessment of read quality, sequencing bias, and genomic regional methylation proportions for both DNA and direct RNA sequencing reads. Its effectiveness has been demonstrated in various scenarios, including comparisons of sequencing methods (whole genome amplification vs. shotgun), sequencing platforms (Oxford Nanopore Technology, ONT vs. Pacific Biosciences, PacBio), tissues (kidney marrow with and without blood), and biological replicates (kidney marrows).

摘要

由于能够生成长的高质量 reads，第三代测序技术越来越受欢迎。对各种样本和测序平台进行有效的比较分析对于理解生物学机制和建立基准基线至关重要。然而，现有的长读长测序工具主要集中于单个样本的质量控制（QC）和处理，这使得多个数据集的比较变得复杂。缺乏用于数据比较和可视化的综合工具给生物信息学经验有限的研究人员带来了挑战。为了填补这一空白，我们推出了 Giraffe（https://github.com/lrslab/Giraffe_View），这是一个基于 Python3 的命令行工具，旨在对不同样本和平台进行比较分析和可视化。Giraffe 有助于评估 DNA 和直接 RNA 测序 reads 的读段质量、测序偏差以及基因组区域甲基化比例。它的有效性已在各种场景中得到证明，包括测序方法（全基因组扩增与鸟枪法）、测序平台（牛津纳米孔技术公司，ONT 与太平洋生物科学公司，PacBio）、组织（有血和无血的骨髓）以及生物学重复（骨髓）的比较。