Suppr超能文献

从短序列读取中推断异构体

Inference of isoforms from short sequence reads.

作者信息

Feng Jianxing, Li Wei, Jiang Tao

机构信息

School of Life Sciences and Technology, Tongji University, China.

出版信息

J Comput Biol. 2011 Mar;18(3):305-21. doi: 10.1089/cmb.2010.0243.

Abstract

Due to alternative splicing events in eukaryotic species, the identification of mRNA isoforms (or splicing variants) is a difficult problem. Traditional experimental methods for this purpose are time consuming and cost ineffective. The emerging RNA-Seq technology provides a possible effective method to address this problem. Although the advantages of RNA-Seq over traditional methods in transcriptome analysis have been confirmed by many studies, the inference of isoforms from millions of short sequence reads (e.g., Illumina/Solexa reads) has remained computationally challenging. In this work, we propose a method to calculate the expression levels of isoforms and infer isoforms from short RNA-Seq reads using exon-intron boundary, transcription start site (TSS) and poly-A site (PAS) information. We first formulate the relationship among exons, isoforms, and single-end reads as a convex quadratic program, and then use an efficient algorithm (called IsoInfer) to search for isoforms. IsoInfer can calculate the expression levels of isoforms accurately if all the isoforms are known and infer novel isoforms from scratch. Our experimental tests on known mouse isoforms with both simulated expression levels and reads demonstrate that IsoInfer is able to calculate the expression levels of isoforms with an accuracy comparable to the state-of-the-art statistical method and a 60 times faster speed. Moreover, our tests on both simulated and real reads show that it achieves a good precision and sensitivity in inferring isoforms when given accurate exon-intron boundary, TSS, and PAS information, especially for isoforms whose expression levels are significantly high. The software is publicly available for free at http://www.cs.ucr.edu/∼jianxing/IsoInfer.html.

摘要

由于真核生物物种中存在可变剪接事件,mRNA异构体(或剪接变体)的识别是一个难题。为此目的的传统实验方法既耗时又成本低效。新兴的RNA测序(RNA-Seq)技术提供了一种可能有效的方法来解决这个问题。尽管许多研究已经证实了RNA-Seq在转录组分析中相对于传统方法的优势,但从数百万条短序列读数(例如Illumina/Solexa读数)中推断异构体在计算上仍然具有挑战性。在这项工作中,我们提出了一种方法,利用外显子-内含子边界、转录起始位点(TSS)和多聚腺苷酸位点(PAS)信息来计算异构体的表达水平,并从短RNA-Seq读数中推断异构体。我们首先将外显子、异构体和单端读数之间的关系表述为一个凸二次规划问题,然后使用一种高效算法(称为IsoInfer)来搜索异构体。如果所有异构体都已知,IsoInfer可以准确计算异构体的表达水平,并从头推断新的异构体。我们对具有模拟表达水平和读数的数据进行的实验测试表明,IsoInfer能够以与最先进的统计方法相当的精度计算异构体的表达水平,且速度快60倍。此外,我们对模拟读数和真实读数的测试表明,当给定准确的外显子-内含子边界、TSS和PAS信息时,特别是对于表达水平显著较高的异构体,它在推断异构体方面具有良好的精度和灵敏度。该软件可在http://www.cs.ucr.edu/∼jianxing/IsoInfer.html上免费公开获取。

相似文献

1
Inference of isoforms from short sequence reads.从短序列读取中推断异构体
J Comput Biol. 2011 Mar;18(3):305-21. doi: 10.1089/cmb.2010.0243.
2
Accurate inference of isoforms from multiple sample RNA-Seq data.从多个样本RNA测序数据中准确推断异构体
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S15. doi: 10.1186/1471-2164-16-S2-S15. Epub 2015 Jan 21.

引用本文的文献

3
Protocol for transcriptome assembly by the TransBorrow algorithm.通过TransBorrow算法进行转录组组装的方案。
Biol Methods Protoc. 2023 Nov 1;8(1):bpad028. doi: 10.1093/biomethods/bpad028. eCollection 2023.
5
Counting pseudoalignments to novel splicing events.计算新剪接事件的伪比对。
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad419.

本文引用的文献

5
TopHat: discovering splice junctions with RNA-Seq.TopHat:利用RNA测序发现剪接接头
Bioinformatics. 2009 May 1;25(9):1105-11. doi: 10.1093/bioinformatics/btp120. Epub 2009 Mar 16.
7
Statistical inferences for isoform expression in RNA-Seq.RNA测序中异构体表达的统计推断。
Bioinformatics. 2009 Apr 15;25(8):1026-32. doi: 10.1093/bioinformatics/btp113. Epub 2009 Feb 25.
9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验