使用ORFanage研究已知和新转录本中的开放阅读框。

Investigating Open Reading Frames in Known and Novel Transcripts using ORFanage.

作者信息

Varabyou Ales, Erdogdu Beril, Salzberg Steven L, Pertea Mihaela

机构信息

Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21211, USA.

Department of Computer Science, Johns Hopkins University, Baltimore, MD 21211, USA.

出版信息

bioRxiv. 2023 Mar 25:2023.03.23.533704. doi: 10.1101/2023.03.23.533704.

DOI:10.1101/2023.03.23.533704

PMID:36993373

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10055401/

Abstract

ORFanage is a system designed to assign open reading frames (ORFs) to both known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA sequencing (RNA-seq) experiments, a capability that most transcriptome assembly methods do not have. Our experiments demonstrate how ORFanage can be used to find novel protein variants in RNA-seq datasets, and to improve the annotations of ORFs in tens of thousands of transcript models in the RefSeq and GENCODE human annotation databases. Through its implementation of a highly accurate and efficient pseudo-alignment algorithm, ORFanage is substantially faster than other ORF annotation methods, enabling its application to very large datasets. When used to analyze transcriptome assemblies, ORFanage can aid in the separation of signal from transcriptional noise and the identification of likely functional transcript variants, ultimately advancing our understanding of biology and medicine.

摘要

孤儿基因预测系统（ORFanage）是一个旨在将开放阅读框（ORF）分配给已知和新的基因转录本，同时最大化与注释蛋白质相似性的系统。ORFanage的主要预期用途是在RNA测序（RNA-seq）实验的组装结果中识别ORF，这是大多数转录组组装方法所不具备的能力。我们的实验展示了如何使用ORFanage在RNA-seq数据集中找到新的蛋白质变体，并改进RefSeq和GENCODE人类注释数据库中数以万计转录本模型的ORF注释。通过实施高度准确和高效的伪比对算法，ORFanage比其他ORF注释方法快得多，使其能够应用于非常大的数据集。当用于分析转录组组装时，ORFanage有助于从转录噪声中分离信号，并识别可能具有功能的转录本变体，最终推动我们对生物学和医学的理解。