TieBrush：一种跨大型数据集聚合和汇总比对读段的有效方法。

TieBrush: an efficient method for aggregating and summarizing mapped reads across large datasets.

作者信息

Varabyou Ales, Pertea Geo, Pockrandt Christopher, Pertea Mihaela

机构信息

Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA.

Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA.

出版信息

Bioinformatics. 2021 Oct 25;37(20):3650-3651. doi: 10.1093/bioinformatics/btab342.

DOI:10.1093/bioinformatics/btab342

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8545345/

Abstract

SUMMARY

Although the ability to programmatically summarize and visually inspect sequencing data is an integral part of genome analysis, currently available methods are not capable of handling large numbers of samples. In particular, making a visual comparison of transcriptional landscapes between two sets of thousands of RNA-seq samples is limited by available computational resources, which can be overwhelmed due to the sheer size of the data. In this work, we present TieBrush, a software package designed to process very large sequencing datasets (RNA, whole-genome, exome, etc.) into a form that enables quick visual and computational inspection. TieBrush can also be used as a method for aggregating data for downstream computational analysis, and is compatible with most software tools that take aligned reads as input.

AVAILABILITY AND IMPLEMENTATION

TieBrush is provided as a C++ package under the MIT License. Precompiled binaries, source code and example data are available on GitHub (https://github.com/alevar/tiebrush).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

摘要

尽管以编程方式总结和可视化检查测序数据的能力是基因组分析的一个组成部分，但目前可用的方法无法处理大量样本。特别是，对两组数千个RNA-seq样本之间的转录图谱进行可视化比较受到可用计算资源的限制，由于数据量巨大，这些资源可能会不堪重负。在这项工作中，我们展示了TieBrush，这是一个软件包，旨在将非常大的测序数据集（RNA、全基因组、外显子组等）处理成一种能够进行快速可视化和计算检查的形式。TieBrush还可以用作汇总数据以进行下游计算分析的方法，并且与大多数将比对读数作为输入的软件工具兼容。

可用性和实现方式

TieBrush以C++包的形式根据MIT许可提供。预编译的二进制文件、源代码和示例数据可在GitHub（https://github.com/alevar/tiebrush）上获取。

补充信息

补充数据可在《生物信息学》在线版上获取。

相似文献

1

TieBrush: an efficient method for aggregating and summarizing mapped reads across large datasets.TieBrush：一种跨大型数据集聚合和汇总比对读段的有效方法。

Bioinformatics. 2021 Oct 25;37(20):3650-3651. doi: 10.1093/bioinformatics/btab342.

2

RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts.RNA-SeQC 2：适用于大型队列的高效 RNA-seq 质量控制和定量分析。

Bioinformatics. 2021 Sep 29;37(18):3048-3050. doi: 10.1093/bioinformatics/btab135.

3

SNIKT: sequence-independent adapter identification and removal in long-read shotgun sequencing data.SNIKT：长读测序数据中序列无关接头的识别与去除。

Bioinformatics. 2022 Aug 2;38(15):3830-3832. doi: 10.1093/bioinformatics/btac389.

4

Simulating Illumina metagenomic data with InSilicoSeq.用 InSilicoSeq 模拟 Illumina 宏基因组数据。

Bioinformatics. 2019 Feb 1;35(3):521-522. doi: 10.1093/bioinformatics/bty630.

5

Efficient population-scale variant analysis and prioritization with VAPr.利用 VAPr 进行高效的群体规模变异分析和优先级排序。

Bioinformatics. 2018 Aug 15;34(16):2843-2845. doi: 10.1093/bioinformatics/bty192.

6

Large scale microbiome profiling in the cloud.大规模微生物组在云端的分析。

Bioinformatics. 2019 Jul 15;35(14):i13-i22. doi: 10.1093/bioinformatics/btz356.

7

baerhunter: an R package for the discovery and analysis of expressed non-coding regions in bacterial RNA-seq data.baerhunter：一个用于在细菌 RNA-seq 数据中发现和分析表达的非编码区域的 R 包。

Bioinformatics. 2020 Feb 1;36(3):966-969. doi: 10.1093/bioinformatics/btz643.

8

RTK: efficient rarefaction analysis of large datasets.RTK：大型数据集的高效稀疏化分析

Bioinformatics. 2017 Aug 15;33(16):2594-2595. doi: 10.1093/bioinformatics/btx206.

9

Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs.迈向完美读段：通过在 De Bruijn 图上进行映射来自我纠正短读段。

Bioinformatics. 2020 Mar 1;36(5):1374-1381. doi: 10.1093/bioinformatics/btz102.

10

Mosdepth: quick coverage calculation for genomes and exomes.Mosdepth：基因组和外显子组的快速覆盖度计算。

Bioinformatics. 2018 Mar 1;34(5):867-868. doi: 10.1093/bioinformatics/btx699.

引用本文的文献

1

Isoswitching drives the aging process in human brains.异开关驱动人类大脑的衰老过程。

bioRxiv. 2025 May 9:2025.05.05.652255. doi: 10.1101/2025.05.05.652255.

2

Conservation assessment of human splice site annotation based on a 470-genome alignment.基于470个基因组比对的人类剪接位点注释的保守性评估。

Nucleic Acids Res. 2025 Mar 20;53(6). doi: 10.1093/nar/gkaf184.

3

Upstream open reading frames may contain hundreds of novel human exons.上游开放阅读框可能包含数百个新的人类外显子。

PLoS Comput Biol. 2024 Nov 20;20(11):e1012543. doi: 10.1371/journal.pcbi.1012543. eCollection 2024 Nov.

4

Transcriptomic Insights into the Atrial Fibrillation Susceptibility Locus near the and Genes.转录组学揭示和基因附近心房颤动易感性位点

Int J Mol Sci. 2024 Sep 25;25(19):10309. doi: 10.3390/ijms251910309.

5

Splam: a deep-learning-based splice site predictor that improves spliced alignments.Splam：一种基于深度学习的剪接位点预测器，可提高剪接对齐。

Genome Biol. 2024 Sep 16;25(1):243. doi: 10.1186/s13059-024-03379-4.

6

Upstream open reading frames may contain hundreds of novel human exons.上游开放阅读框可能包含数百个新的人类外显子。

bioRxiv. 2024 Apr 1:2024.03.22.586333. doi: 10.1101/2024.03.22.586333.

7

Detecting differential transcript usage in complex diseases with SPIT.使用 SPIT 检测复杂疾病中的差异转录本使用。

Cell Rep Methods. 2024 Mar 25;4(3):100736. doi: 10.1016/j.crmeth.2024.100736. Epub 2024 Mar 19.

8

Investigating Open Reading Frames in Known and Novel Transcripts using ORFanage.使用ORFanage研究已知和新转录本中的开放阅读框。

Nat Comput Sci. 2023 Aug;3(8):700-708. doi: 10.1038/s43588-023-00496-1. Epub 2023 Jul 31.

9

Conservation assessment of human splice site annotation based on a 470-genome alignment.基于470个基因组比对的人类剪接位点注释的保守性评估

bioRxiv. 2025 Mar 15:2023.12.01.569581. doi: 10.1101/2023.12.01.569581.

10

CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure.CHESS 3：基于大规模表达数据、系统发育分析和蛋白质结构，改进和综合的人类基因和转录本目录。

Genome Biol. 2023 Oct 30;24(1):249. doi: 10.1186/s13059-023-03088-4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验