CAFU：一个用于探索未映射RNA测序数据的Galaxy框架。

CAFU: a Galaxy framework for exploring unmapped RNA-Seq data.

作者信息

Chen Siyuan, Ren Chengzhi, Zhai Jingjing, Yu Jiantao, Zhao Xuyang, Li Zelong, Zhang Ting, Ma Wenlong, Han Zhaoxue, Ma Chuang

机构信息

State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest Agriculture and Forestry University.

College of Information Engineering, Northwest Agriculture and Forestry University.

出版信息

Brief Bioinform. 2020 Mar 23;21(2):676-686. doi: 10.1093/bib/bbz018.

DOI:10.1093/bib/bbz018

PMID:30815667

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7299299/

Abstract

A widely used approach in transcriptome analysis is the alignment of short reads to a reference genome. However, owing to the deficiencies of specially designed analytical systems, short reads unmapped to the genome sequence are usually ignored, resulting in the loss of significant biological information and insights. To fill this gap, we present Comprehensive Assembly and Functional annotation of Unmapped RNA-Seq data (CAFU), a Galaxy-based framework that can facilitate the large-scale analysis of unmapped RNA sequencing (RNA-Seq) reads from single- and mixed-species samples. By taking advantage of machine learning techniques, CAFU addresses the issue of accurately identifying the species origin of transcripts assembled using unmapped reads from mixed-species samples. CAFU also represents an innovation in that it provides a comprehensive collection of functions required for transcript confidence evaluation, coding potential calculation, sequence and expression characterization and function annotation. These functions and their dependencies have been integrated into a Galaxy framework that provides access to CAFU via a user-friendly interface, dramatically simplifying complex exploration tasks involving unmapped RNA-Seq reads. CAFU has been validated with RNA-Seq data sets from wheat and Zea mays (maize) samples. CAFU is freely available via GitHub: https://github.com/cma2015/CAFU.

摘要

转录组分析中一种广泛使用的方法是将短读段与参考基因组进行比对。然而，由于专门设计的分析系统存在缺陷，未映射到基因组序列的短读段通常被忽略，导致重要生物学信息和见解的丢失。为了填补这一空白，我们提出了未映射RNA-Seq数据的全面组装和功能注释（CAFU），这是一个基于Galaxy的框架，可促进对来自单物种和混合物种样本的未映射RNA测序（RNA-Seq）读段进行大规模分析。通过利用机器学习技术，CAFU解决了准确识别使用来自混合物种样本的未映射读段组装的转录本的物种来源的问题。CAFU还具有创新性，它提供了转录本置信度评估、编码潜力计算、序列和表达特征分析以及功能注释所需的全面功能集合。这些功能及其依赖关系已集成到一个Galaxy框架中，该框架通过用户友好的界面提供对CAFU的访问，极大地简化了涉及未映射RNA-Seq读段的复杂探索任务。CAFU已通过来自小麦和玉米样本的RNA-Seq数据集进行了验证。可通过GitHub免费获取CAFU：https://github.com/cma2015/CAFU。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/07cc/7299299/8a287a1a415c/bbz018f1.jpg

相似文献

CAFU: a Galaxy framework for exploring unmapped RNA-Seq data.CAFU：一个用于探索未映射RNA测序数据的Galaxy框架。

Brief Bioinform. 2020 Mar 23;21(2):676-686. doi: 10.1093/bib/bbz018.

Another lesson from unmapped reads: in-depth analysis of RNA-Seq reads from various horse tissues.另一个来自未映射reads 的教训：对来自不同马组织的 RNA-Seq reads 的深度分析。

J Appl Genet. 2022 Sep;63(3):571-581. doi: 10.1007/s13353-022-00705-z. Epub 2022 Jun 7.

TopHat-Recondition: a post-processor for TopHat unmapped reads.TopHat重处理：一种用于TopHat未比对 reads 的后处理器。

BMC Bioinformatics. 2016 May 4;17(1):199. doi: 10.1186/s12859-016-1058-x.

RNA-Seq in Nonmodel Organisms.非模式生物的 RNA-Seq。

Methods Mol Biol. 2021;2243:143-167. doi: 10.1007/978-1-0716-1103-6_8.

FX: an RNA-Seq analysis tool on the cloud.FX：一个云端的 RNA-Seq 分析工具。

Bioinformatics. 2012 Mar 1;28(5):721-3. doi: 10.1093/bioinformatics/bts023. Epub 2012 Jan 17.

Exploring the unmapped DNA and RNA reads in a songbird genome.探索鸣禽基因组中的未映射 DNA 和 RNA 读数。

BMC Genomics. 2019 Jan 8;20(1):19. doi: 10.1186/s12864-018-5378-2.

SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.SPARTA：用于基于参考的细菌RNA测序转录组自动分析的简单程序。

BMC Bioinformatics. 2016 Feb 4;17:66. doi: 10.1186/s12859-016-0923-y.

Optimizing RNA-Seq Mapping with STAR.使用STAR优化RNA测序比对

Methods Mol Biol. 2016;1415:245-62. doi: 10.1007/978-1-4939-3572-7_13.

Baiting out a full length sequence from unmapped RNA-seq data.从未映射的 RNA-seq 数据中钓出全长序列。

BMC Genomics. 2021 Nov 27;22(1):857. doi: 10.1186/s12864-021-08146-4.

Transcriptomic Analysis of C. elegans RNA Sequencing Data Through the Tuxedo Suite on the Galaxy Project.通过银河项目上的Tuxedo套件对秀丽隐杆线虫RNA测序数据进行转录组分析。

J Vis Exp. 2017 Apr 8(122):55473. doi: 10.3791/55473.

引用本文的文献

Changes in mA RNA methylation are associated with male sterility in wolfberry.mRNA 甲基化的变化与枸杞雄性不育有关。

BMC Plant Biol. 2023 Sep 29;23(1):456. doi: 10.1186/s12870-023-04458-7.

Design, execution, and interpretation of plant RNA-seq analyses.植物RNA测序分析的设计、执行与解读

Front Plant Sci. 2023 Jun 30;14:1135455. doi: 10.3389/fpls.2023.1135455. eCollection 2023.

Baiting out a full length sequence from unmapped RNA-seq data.从未映射的 RNA-seq 数据中钓出全长序列。

BMC Genomics. 2021 Nov 27;22(1):857. doi: 10.1186/s12864-021-08146-4.

Interactive Web-based Annotation of Plant MicroRNAs with iwa-miRNA.使用 iwa-miRNA 进行植物 microRNAs 的交互式网络注释。

Genomics Proteomics Bioinformatics. 2022 Jun;20(3):557-567. doi: 10.1016/j.gpb.2021.02.010. Epub 2021 Jul 28.

Comparative RNA-Seq transcriptome analyses reveal dynamic time-dependent effects of Fe, O, and Si irradiation on the induction of murine hepatocellular carcinoma.比较 RNA-Seq 转录组分析揭示了 Fe、O 和 Si 辐照对诱导小鼠肝癌的动态时间依赖性影响。

BMC Genomics. 2020 Jul 1;21(1):453. doi: 10.1186/s12864-020-06869-4.

Optimized sequencing depth and de novo assembler for deeply reconstructing the transcriptome of the tea plant, an economically important plant species.优化测序深度和从头组装算法，以深度重建经济重要植物物种茶树的转录组。

BMC Bioinformatics. 2019 Nov 6;20(1):553. doi: 10.1186/s12859-019-3166-x.

本文引用的文献

Development of Race-Specific SCAR Markers for Detection of Chinese Races CYR32 and CYR33 of Puccinia striiformis f. sp. tritici.用于检测小麦条锈菌中国小种CYR32和CYR33的种族特异性SCAR标记的开发

Plant Dis. 2010 Feb;94(2):221-228. doi: 10.1094/PDIS-94-2-0221.

Exploring the unmapped DNA and RNA reads in a songbird genome.探索鸣禽基因组中的未映射 DNA 和 RNA 读数。

BMC Genomics. 2019 Jan 8;20(1):19. doi: 10.1186/s12864-018-5378-2.

MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data.MetaMap：人类疾病相关 RNA-seq 数据中转录组数据的图谱。

Gigascience. 2018 Jun 1;7(6). doi: 10.1093/gigascience/giy070.

Transcriptomic Analysis Reveal the Molecular Mechanisms of Wheat Higher-Temperature Seedling-Plant Resistance to f. sp. .转录组分析揭示了小麦高温幼苗对叶锈菌抗性的分子机制。

Front Plant Sci. 2018 Feb 28;9:240. doi: 10.3389/fpls.2018.00240. eCollection 2018.

GeneGini: Assessment via the Gini Coefficient of Reference "Housekeeping" Genes and Diverse Human Transporter Expression Profiles.GeneGini：基于基尼系数对参考“管家”基因和多样化的人类转运蛋白表达谱的评估。

Cell Syst. 2018 Feb 28;6(2):230-244.e1. doi: 10.1016/j.cels.2018.01.003. Epub 2018 Feb 7.

Co-expression networks reveal the tissue-specific regulation of transcription and splicing.共表达网络揭示了转录和剪接的组织特异性调控。

Genome Res. 2017 Nov;27(11):1843-1858. doi: 10.1101/gr.216721.116. Epub 2017 Oct 11.

Construction and Optimization of a Large Gene Coexpression Network in Maize Using RNA-Seq Data.利用RNA-Seq数据构建和优化玉米大型基因共表达网络

Plant Physiol. 2017 Sep;175(1):568-583. doi: 10.1104/pp.17.00825. Epub 2017 Aug 2.

A systems approach to a spatio-temporal understanding of the drought stress response in maize.采用系统方法理解玉米干旱胁迫响应的时空特征。

Sci Rep. 2017 Jul 26;7(1):6590. doi: 10.1038/s41598-017-06929-y.

Improved maize reference genome with single-molecule technologies.利用单分子技术改进玉米参考基因组。

Nature. 2017 Jun 22;546(7659):524-527. doi: 10.1038/nature22971. Epub 2017 Jun 12.

CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features.CPC2：一种基于序列固有特征的快速准确编码潜能计算器。

Nucleic Acids Res. 2017 Jul 3;45(W1):W12-W16. doi: 10.1093/nar/gkx428.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CAFU：一个用于探索未映射RNA测序数据的Galaxy框架。

CAFU: a Galaxy framework for exploring unmapped RNA-Seq data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献