Suppr超能文献

ExUTR:一种从 NGS 数据中大规模预测 3'-UTR 序列的新管道。

ExUTR: a novel pipeline for large-scale prediction of 3'-UTR sequences from NGS data.

机构信息

UCD School of Biology and Environmental Science, University College Dublin, Belfield, Dublin 4, Ireland.

出版信息

BMC Genomics. 2017 Nov 6;18(1):847. doi: 10.1186/s12864-017-4241-1.

Abstract

BACKGROUND

The three prime untranslated region (3'-UTR) is known to play a pivotal role in modulating gene expression by determining the fate of mRNA. Many crucial developmental events, such as mammalian spermatogenesis, tissue patterning, sex determination and neurogenesis, rely heavily on post-transcriptional regulation by the 3'-UTR. However, 3'-UTR biology seems to be a relatively untapped field, with only limited tools and 3'-UTR resources available. To elucidate the regulatory mechanisms of the 3'-UTR on gene expression, firstly the 3'-UTR sequences must be identified. Current 3'-UTR mining tools, such as GETUTR, 3USS and UTRscan, all depend on a well-annotated reference genome or curated 3'-UTR sequences, which hinders their application on a myriad of non-model organisms where the genomes are not available. To address these issues, the establishment of an NGS-based, automated pipeline is urgently needed for genome-wide 3'-UTR prediction in the absence of reference genomes.

RESULTS

Here, we propose ExUTR, a novel NGS-based pipeline to predict and retrieve 3'-UTR sequences from RNA-Seq experiments, particularly designed for non-model species lacking well-annotated genomes. This pipeline integrates cutting-edge bioinformatics tools, databases (Uniprot and UTRdb) and novel in-house Perl scripts, implementing a fully automated workflow. By taking transcriptome assemblies as inputs, this pipeline identifies 3'-UTR signals based primarily on the intrinsic features of transcripts, and outputs predicted 3'-UTR candidates together with associated annotations. In addition, ExUTR only requires minimal computational resources, which facilitates its implementation on a standard desktop computer with reasonable runtime, making it affordable to use for most laboratories. We also demonstrate the functionality and extensibility of this pipeline using publically available RNA-Seq data from both model and non-model species, and further validate the accuracy of predicted 3'-UTR using both well-characterized 3'-UTR resources and 3P-Seq data.

CONCLUSIONS

ExUTR is a practical and powerful workflow that enables rapid genome-wide 3'-UTR discovery from NGS data. The candidates predicted through this pipeline will further advance the study of miRNA target prediction, cis elements in 3'-UTR and the evolution and biology of 3'-UTRs. Being independent of a well-annotated reference genome will dramatically expand its application to much broader research area, encompassing all species for which RNA-Seq is available.

摘要

背景

已知三个非翻译区(3'-UTR)在调节基因表达方面发挥着关键作用,通过决定 mRNA 的命运来实现。许多关键的发育事件,如哺乳动物精子发生、组织模式形成、性别决定和神经发生,都严重依赖于 3'-UTR 的转录后调控。然而,3'-UTR 生物学似乎是一个相对未开发的领域,可用的工具和 3'-UTR 资源有限。为了阐明 3'-UTR 对基因表达的调控机制,首先必须鉴定 3'-UTR 序列。目前的 3'-UTR 挖掘工具,如 GETUTR、3USS 和 UTRscan,都依赖于注释良好的参考基因组或经过精心整理的 3'-UTR 序列,这限制了它们在众多没有可用基因组的非模式生物中的应用。为了解决这些问题,迫切需要建立一个基于 NGS 的自动化管道,以便在没有参考基因组的情况下进行全基因组 3'-UTR 预测。

结果

在这里,我们提出了 ExUTR,这是一种新颖的基于 NGS 的管道,用于从 RNA-Seq 实验中预测和检索 3'-UTR 序列,特别是为缺乏良好注释基因组的非模式物种设计的。该管道集成了最先进的生物信息学工具、数据库(Uniprot 和 UTRdb)和新颖的内部 Perl 脚本,实现了完全自动化的工作流程。该管道以转录组组装作为输入,主要基于转录本的固有特征来识别 3'-UTR 信号,并输出预测的 3'-UTR 候选序列及其相关注释。此外,ExUTR 仅需要最小的计算资源,这使得它可以在具有合理运行时间的标准台式计算机上实现,对于大多数实验室来说都负担得起。我们还使用来自模型和非模型物种的公共 RNA-Seq 数据演示了该管道的功能和可扩展性,并进一步使用经过充分验证的 3'-UTR 资源和 3P-Seq 数据验证了预测的 3'-UTR 的准确性。

结论

ExUTR 是一种实用且强大的工作流程,可从 NGS 数据中快速进行全基因组 3'-UTR 发现。通过该管道预测的候选序列将进一步推进 miRNA 靶标预测、3'-UTR 中的顺式元件以及 3'-UTR 的进化和生物学的研究。不依赖于注释良好的参考基因组将极大地扩展其应用范围,涵盖所有可提供 RNA-Seq 的物种。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/557c/5674806/c04307a83942/12864_2017_4241_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验