• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

读取映射与转录本组装:一种用于核糖核酸测序数据处理与分析的可扩展且高通量的工作流程

Read Mapping and Transcript Assembly: A Scalable and High-Throughput Workflow for the Processing and Analysis of Ribonucleic Acid Sequencing Data.

作者信息

Peri Sateesh, Roberts Sarah, Kreko Isabella R, McHan Lauren B, Naron Alexandra, Ram Archana, Murphy Rebecca L, Lyons Eric, Gregory Brian D, Devisetty Upendra K, Nelson Andrew D L

机构信息

Genetics Graduate Interdisciplinary Group, University of Arizona, Tucson, AZ, United States.

CyVerse, University of Arizona, Tucson, AZ, United States.

出版信息

Front Genet. 2020 Jan 24;10:1361. doi: 10.3389/fgene.2019.01361. eCollection 2019.

DOI:10.3389/fgene.2019.01361
PMID:32038716
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6993073/
Abstract

Next-generation RNA-sequencing is an incredibly powerful means of generating a snapshot of the transcriptomic state within a cell, tissue, or whole organism. As the questions addressed by RNA-sequencing (RNA-seq) become both more complex and greater in number, there is a need to simplify RNA-seq processing workflows, make them more efficient and interoperable, and capable of handling both large and small datasets. This is especially important for researchers who need to process hundreds to tens of thousands of RNA-seq datasets. To address these needs, we have developed a scalable, user-friendly, and easily deployable analysis suite called RMTA (Read Mapping, Transcript Assembly). RMTA can easily process thousands of RNA-seq datasets with features that include automated read quality analysis, filters for lowly expressed transcripts, and read counting for differential expression analysis. RMTA is containerized using Docker for easy deployment within any compute environment [cloud, local, or high-performance computing (HPC)] and is available as two apps in CyVerse's Discovery Environment, one for normal use and one specifically designed for introducing undergraduates and high school to RNA-seq analysis. For extremely large datasets (tens of thousands of FASTq files) we developed a high-throughput, scalable, and parallelized version of RMTA optimized for launching on the Open Science Grid (OSG) from within the Discovery Environment. OSG-RMTA allows users to utilize the Discovery Environment for data management, parallelization, and submitting jobs to OSG, and finally, employ the OSG for distributed, high throughput computing. Alternatively, OSG-RMTA can be run directly on the OSG through the command line. RMTA is designed to be useful for data scientists, of any skill level, interested in rapidly and reproducibly analyzing their large RNA-seq data sets.

摘要

下一代RNA测序是一种极其强大的手段,可生成细胞、组织或整个生物体中转录组状态的快照。随着RNA测序(RNA-seq)所解决的问题在复杂性和数量上都不断增加,有必要简化RNA-seq处理工作流程,使其更高效、更具互操作性,并能够处理大小数据集。这对于需要处理数百到数万个RNA-seq数据集的研究人员尤为重要。为满足这些需求,我们开发了一个名为RMTA(读取映射、转录本组装)的可扩展、用户友好且易于部署的分析套件。RMTA能够轻松处理数千个RNA-seq数据集,其功能包括自动读取质量分析、低表达转录本过滤器以及用于差异表达分析的读取计数。RMTA使用Docker进行容器化,以便在任何计算环境(云、本地或高性能计算(HPC))中轻松部署,并且作为两个应用程序在CyVerse的发现环境中可用,一个用于正常使用,另一个专门为向本科生和高中生介绍RNA-seq分析而设计。对于超大型数据集(数万个FASTq文件),我们开发了一个针对在发现环境中从开放科学网格(OSG)上启动进行优化的高通量、可扩展且并行化的RMTA版本。OSG-RMTA允许用户利用发现环境进行数据管理、并行化以及向OSG提交作业,最后,利用OSG进行分布式高通量计算。或者,OSG-RMTA可以通过命令行直接在OSG上运行。RMTA旨在对任何技能水平、对快速且可重复地分析其大型RNA-seq数据集感兴趣的数据科学家有用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/542a/6993073/e330b4572a21/fgene-10-01361-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/542a/6993073/d88f147c3491/fgene-10-01361-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/542a/6993073/2cefbfaaabbd/fgene-10-01361-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/542a/6993073/e330b4572a21/fgene-10-01361-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/542a/6993073/d88f147c3491/fgene-10-01361-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/542a/6993073/2cefbfaaabbd/fgene-10-01361-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/542a/6993073/e330b4572a21/fgene-10-01361-g003.jpg

相似文献

1
Read Mapping and Transcript Assembly: A Scalable and High-Throughput Workflow for the Processing and Analysis of Ribonucleic Acid Sequencing Data.读取映射与转录本组装:一种用于核糖核酸测序数据处理与分析的可扩展且高通量的工作流程
Front Genet. 2020 Jan 24;10:1361. doi: 10.3389/fgene.2019.01361. eCollection 2019.
2
OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid.OSG-GEM:利用开放科学网格构建基因表达矩阵
Bioinform Biol Insights. 2016 Aug 2;10:133-41. doi: 10.4137/BBI.S38193. eCollection 2016.
3
L-RAPiT: A Cloud-Based Computing Pipeline for the Analysis of Long-Read RNA Sequencing Data.L-RAPiT:一个基于云的长读 RNA 测序数据分析计算流程。
Int J Mol Sci. 2022 Dec 13;23(24):15851. doi: 10.3390/ijms232415851.
4
Cloud accelerated alignment and assembly of full-length single-cell RNA-seq data using Falco.使用 Falco 实现全长单细胞 RNA-seq 数据的云加速比对和组装。
BMC Genomics. 2019 Dec 30;20(Suppl 10):927. doi: 10.1186/s12864-019-6341-6.
5
GEMmaker: process massive RNA-seq datasets on heterogeneous computational infrastructure.GEMmaker:在异构计算基础设施上处理大规模 RNA-seq 数据集。
BMC Bioinformatics. 2022 May 2;23(1):156. doi: 10.1186/s12859-022-04629-7.
6
CIPHER: a flexible and extensive workflow platform for integrative next-generation sequencing data analysis and genomic regulatory element prediction.CIPHER:一个用于整合下一代测序数据分析和基因组调控元件预测的灵活且功能广泛的工作流程平台。
BMC Bioinformatics. 2017 Aug 8;18(1):363. doi: 10.1186/s12859-017-1770-1.
7
SPEAQeasy: a scalable pipeline for expression analysis and quantification for R/bioconductor-powered RNA-seq analyses.SPEAQeasy:一个用于 R/bioconductor 驱动的 RNA-seq 分析中表达分析和定量的可扩展流水线。
BMC Bioinformatics. 2021 May 1;22(1):224. doi: 10.1186/s12859-021-04142-3.
8
FastqPuri: high-performance preprocessing of RNA-seq data.FastqPuri:RNA-seq 数据的高性能预处理。
BMC Bioinformatics. 2019 May 3;20(1):226. doi: 10.1186/s12859-019-2799-0.
9
Watchdog - a workflow management system for the distributed analysis of large-scale experimental data.Watchdog - 一种用于大规模实验数据分析的分布式工作流管理系统。
BMC Bioinformatics. 2018 Mar 13;19(1):97. doi: 10.1186/s12859-018-2107-4.
10
PhytoPipe: a phytosanitary pipeline for plant pathogen detection and diagnosis using RNA-seq data.PhytoPipe:一种使用 RNA-seq 数据进行植物病原体检测和诊断的植物卫生管道。
BMC Bioinformatics. 2023 Dec 13;24(1):470. doi: 10.1186/s12859-023-05589-2.

引用本文的文献

1
Transcript profiling of plastid ferrochelatase two mutants reveals that chloroplast singlet oxygen signals lead to global changes in RNA profiles and are mediated by Plant U-Box 4.质体亚铁螯合酶两个突变体的转录本分析表明,叶绿体单线态氧信号导致RNA谱的全局变化,并由植物U-盒4介导。
BMC Plant Biol. 2025 Jun 3;25(1):747. doi: 10.1186/s12870-025-06703-7.
2
Long intergenic non-coding RNAs modulate proximal protein-coding gene expression and tolerance to Candidatus Liberibacter spp. in potatoes.长链非编码 RNA 调节马铃薯中近蛋白编码基因的表达和对候选菌属的耐受性。
Commun Biol. 2024 Sep 6;7(1):1095. doi: 10.1038/s42003-024-06763-9.
3

本文引用的文献

1
Alignment and mapping methodology influence transcript abundance estimation.比对和映射方法会影响转录本丰度的估计。
Genome Biol. 2020 Sep 7;21(1):239. doi: 10.1186/s13059-020-02151-8.
2
CoGe LoadExp+: A web-based suite that integrates next-generation sequencing data analysis workflows and visualization.CoGe LoadExp+:一个集成了下一代测序数据分析工作流程与可视化功能的基于网络的套件。
Plant Direct. 2017 Jul 20;1(2). doi: 10.1002/pld3.8. eCollection 2017 Jul.
3
A transcriptome-wide association study of high-grade serous epithelial ovarian cancer identifies new susceptibility genes and splice variants.
Cotton Meristem Transcriptomes: Constructing an RNA-Seq Pipeline to Explore Crop Architecture Regulation.
棉纤维分生组织转录组:构建 RNA-Seq 管道以探索作物结构调控。
Methods Mol Biol. 2024;2812:215-233. doi: 10.1007/978-1-0716-3886-6_12.
4
Regulation of a single inositol 1-phosphate synthase homeologue by HSFA6B contributes to fibre yield maintenance under drought conditions in upland cotton.HSFA6B 调控单个肌醇 1-磷酸合酶同源物有助于旱地棉花在干旱条件下保持纤维产量。
Plant Biotechnol J. 2024 Oct;22(10):2756-2772. doi: 10.1111/pbi.14402. Epub 2024 Jun 21.
5
Transcript profiling of mutants reveals that chloroplast singlet oxygen signals lead to global changes in RNA profiles and are mediated by Plant U-Box 4.突变体的转录谱分析表明,叶绿体单线态氧信号会导致RNA谱的全局变化,并由植物U-box 4介导。
bioRxiv. 2024 Nov 26:2024.05.13.593788. doi: 10.1101/2024.05.13.593788.
6
Low-Protein Diets Composed of Protein Recovered from Food Processing Supported Growth, but Induced Mild Hepatic Steatosis Compared with a No-Protein Diet in Young Female Rats.从食品加工中回收的蛋白质组成的低蛋白饮食支持生长,但与无蛋白饮食相比,年轻雌性大鼠的肝脏出现轻微脂肪变性。
J Nutr. 2023 Jun;153(6):1668-1679. doi: 10.1016/j.tjnut.2023.03.028. Epub 2023 Mar 27.
7
Telomerase RNA in Hymenoptera (Insecta) switched to plant/ciliate-like biogenesis.膜翅目(昆虫)中的端粒酶 RNA 转向植物/纤毛样生物发生。
Nucleic Acids Res. 2023 Jan 11;51(1):420-433. doi: 10.1093/nar/gkac1202.
8
Evolutionary analysis of the LORELEI gene family in plants reveals regulatory subfunctionalization.植物 LORELEI 基因家族的进化分析揭示了调控亚功能化。
Plant Physiol. 2022 Nov 28;190(4):2539-2556. doi: 10.1093/plphys/kiac444.
9
Identification and functional annotation of long intergenic non-coding RNAs in Brassicaceae.鉴定和功能注释芸薹科长基因间非编码 RNA。
Plant Cell. 2022 Aug 25;34(9):3233-3260. doi: 10.1093/plcell/koac166.
10
Evolution of plant telomerase RNAs: farther to the past, deeper to the roots.植物端粒酶 RNA 的进化:追溯到更早的过去,扎根于更深处。
Nucleic Acids Res. 2021 Jul 21;49(13):7680-7694. doi: 10.1093/nar/gkab545.
一项高级别浆液性上皮性卵巢癌的转录组关联研究确定了新的易感性基因和剪接变异体。
Nat Genet. 2019 May;51(5):815-823. doi: 10.1038/s41588-019-0395-x. Epub 2019 May 1.
4
N-Methyladenosine Inhibits Local Ribonucleolytic Cleavage to Stabilize mRNAs in Arabidopsis.N6-甲基腺苷抑制局部核糖核酸内切酶切割以稳定拟南芥中的 mRNAs。
Cell Rep. 2018 Oct 30;25(5):1146-1157.e3. doi: 10.1016/j.celrep.2018.10.020.
5
RSEQREP: RNA-Seq Reports, an open-source cloud-enabled framework for reproducible RNA-Seq data processing, analysis, and result reporting.RSEQREP:RNA测序报告,一个用于可重复的RNA测序数据处理、分析和结果报告的开源云框架。
F1000Res. 2017 Dec 21;6:2162. doi: 10.12688/f1000research.13049.2. eCollection 2017.
6
Massive mining of publicly available RNA-seq data from human and mouse.大规模挖掘人类和小鼠公共可用的 RNA-seq 数据。
Nat Commun. 2018 Apr 10;9(1):1366. doi: 10.1038/s41467-018-03751-6.
7
EPIC-CoGe: managing and analyzing genomic data.EPIC-CoGe:管理和分析基因组数据。
Bioinformatics. 2018 Aug 1;34(15):2651-2653. doi: 10.1093/bioinformatics/bty106.
8
Deciphering genetic factors that determine melon fruit-quality traits using RNA-Seq-based high-resolution QTL and eQTL mapping.利用 RNA-Seq 为基础的高分辨率 QTL 和 eQTL 图谱解析决定甜瓜果实品质性状的遗传因素。
Plant J. 2018 Apr;94(1):169-191. doi: 10.1111/tpj.13838.
9
The Cancer Genomics Cloud: Collaborative, Reproducible, and Democratized-A New Paradigm in Large-Scale Computational Research.癌症基因组学云:协作、可重复且民主化——大规模计算研究的新范式
Cancer Res. 2017 Nov 1;77(21):e3-e6. doi: 10.1158/0008-5472.CAN-17-0387.
10
Evolinc: A Tool for the Identification and Evolutionary Comparison of Long Intergenic Non-coding RNAs.Evolinc:一种用于长基因间非编码RNA鉴定与进化比较的工具。
Front Genet. 2017 May 9;8:52. doi: 10.3389/fgene.2017.00052. eCollection 2017.