Suppr超能文献

riboCleaner:一种用于鉴定和量化植物 RNA-seq 数据中 rRNA 读取污染的工具。

riboCleaner: a pipeline to identify and quantify rRNA read contamination from RNA-seq data in plants.

机构信息

Computational Biology, BASF Corporation, Research Triangle Park, NC 27709-3528, USA.

出版信息

Bioinformatics. 2022 Aug 2;38(15):3840-3843. doi: 10.1093/bioinformatics/btac402.

Abstract

MOTIVATION

Analysis of gene expression data can be crucial for elucidating biological relationships within living organisms. However, accurate quantification of gene expression relies directly upon the accuracy of the reference genome or transcriptome to which the expression data are mapped. Errors in gene annotation can lead to errors in the quantification of gene expression. One source of gene annotation error in eukaryotes arises from incorrect predictions of messenger RNA gene models within ribosomal DNA (rDNA) regions.

RESULTS

Here, we provide examples of how the presence of false gene models in rDNA regions can result in a handful of genes appearing to contribute to >50% of the total transcripts per million values of entire RNA-seq datasets. To this end, we have created riboCleaner, a bioinformatics pipeline designed to identify misannotated gene models in rDNA regions and quantify rRNA-derived reads in RNA-seq data. We also show the applicability of riboCleaner in several plant genome assemblies.

AVAILABILITY AND IMPLEMENTATION

We have implemented riboCleaner as a containerized Snakemake workflow. The workflow, instructions for building the container and other documentation are available at https://github.com/basf. The data underlying this article are available in GitHub at https://github.com/basf/riboCleaner. For convenience, a prebuilt Docker image containing riboCleaner is available at https://hub.docker.com/u/basfcontainers.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

分析基因表达数据对于阐明生物体内部的生物学关系至关重要。然而,基因表达的准确定量直接依赖于所映射的参考基因组或转录组的准确性。基因注释错误会导致基因表达定量错误。真核生物中基因注释错误的一个来源是核糖体 DNA(rDNA)区域内信使 RNA 基因模型的错误预测。

结果

在这里,我们提供了一些例子,说明 rDNA 区域中假基因模型的存在如何导致少数基因似乎对整个 RNA-seq 数据集的每百万个转录物的 50%以上有贡献。为此,我们创建了 riboCleaner,这是一个生物信息学管道,旨在识别 rDNA 区域中错误注释的基因模型,并量化 RNA-seq 数据中的 rRNA 衍生读取。我们还展示了 riboCleaner 在几个植物基因组组装中的适用性。

可用性和实现

我们已经将 riboCleaner 实现为一个容器化的 Snakemake 工作流程。该工作流程、构建容器的说明和其他文档可在 https://github.com/basf 上获得。本文所依据的数据可在 GitHub 上的 https://github.com/basf/riboCleaner 中获得。为方便起见,可在 https://hub.docker.com/u/basfcontainers 上获得包含 riboCleaner 的预构建 Docker 映像。

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验