• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从短序列读取中推断异构体

Inference of isoforms from short sequence reads.

作者信息

Feng Jianxing, Li Wei, Jiang Tao

机构信息

School of Life Sciences and Technology, Tongji University, China.

出版信息

J Comput Biol. 2011 Mar;18(3):305-21. doi: 10.1089/cmb.2010.0243.

DOI:10.1089/cmb.2010.0243
PMID:21385036
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3123862/
Abstract

Due to alternative splicing events in eukaryotic species, the identification of mRNA isoforms (or splicing variants) is a difficult problem. Traditional experimental methods for this purpose are time consuming and cost ineffective. The emerging RNA-Seq technology provides a possible effective method to address this problem. Although the advantages of RNA-Seq over traditional methods in transcriptome analysis have been confirmed by many studies, the inference of isoforms from millions of short sequence reads (e.g., Illumina/Solexa reads) has remained computationally challenging. In this work, we propose a method to calculate the expression levels of isoforms and infer isoforms from short RNA-Seq reads using exon-intron boundary, transcription start site (TSS) and poly-A site (PAS) information. We first formulate the relationship among exons, isoforms, and single-end reads as a convex quadratic program, and then use an efficient algorithm (called IsoInfer) to search for isoforms. IsoInfer can calculate the expression levels of isoforms accurately if all the isoforms are known and infer novel isoforms from scratch. Our experimental tests on known mouse isoforms with both simulated expression levels and reads demonstrate that IsoInfer is able to calculate the expression levels of isoforms with an accuracy comparable to the state-of-the-art statistical method and a 60 times faster speed. Moreover, our tests on both simulated and real reads show that it achieves a good precision and sensitivity in inferring isoforms when given accurate exon-intron boundary, TSS, and PAS information, especially for isoforms whose expression levels are significantly high. The software is publicly available for free at http://www.cs.ucr.edu/∼jianxing/IsoInfer.html.

摘要

由于真核生物物种中存在可变剪接事件,mRNA异构体(或剪接变体)的识别是一个难题。为此目的的传统实验方法既耗时又成本低效。新兴的RNA测序(RNA-Seq)技术提供了一种可能有效的方法来解决这个问题。尽管许多研究已经证实了RNA-Seq在转录组分析中相对于传统方法的优势,但从数百万条短序列读数(例如Illumina/Solexa读数)中推断异构体在计算上仍然具有挑战性。在这项工作中,我们提出了一种方法,利用外显子-内含子边界、转录起始位点(TSS)和多聚腺苷酸位点(PAS)信息来计算异构体的表达水平,并从短RNA-Seq读数中推断异构体。我们首先将外显子、异构体和单端读数之间的关系表述为一个凸二次规划问题,然后使用一种高效算法(称为IsoInfer)来搜索异构体。如果所有异构体都已知,IsoInfer可以准确计算异构体的表达水平,并从头推断新的异构体。我们对具有模拟表达水平和读数的数据进行的实验测试表明,IsoInfer能够以与最先进的统计方法相当的精度计算异构体的表达水平,且速度快60倍。此外,我们对模拟读数和真实读数的测试表明,当给定准确的外显子-内含子边界、TSS和PAS信息时,特别是对于表达水平显著较高的异构体,它在推断异构体方面具有良好的精度和灵敏度。该软件可在http://www.cs.ucr.edu/∼jianxing/IsoInfer.html上免费公开获取。

相似文献

1
Inference of isoforms from short sequence reads.从短序列读取中推断异构体
J Comput Biol. 2011 Mar;18(3):305-21. doi: 10.1089/cmb.2010.0243.
2
Accurate inference of isoforms from multiple sample RNA-Seq data.从多个样本RNA测序数据中准确推断异构体
BMC Genomics. 2015;16 Suppl 2(Suppl 2):S15. doi: 10.1186/1471-2164-16-S2-S15. Epub 2015 Jan 21.
3
Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads.从偏向性 RNA-Seq 读段进行转录组组装和异构体表达水平估计。
Bioinformatics. 2012 Nov 15;28(22):2914-21. doi: 10.1093/bioinformatics/bts559. Epub 2012 Oct 11.
4
Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing.弗雷迪:使用长读测序进行注释独立的转录组可变剪接异构体的检测和发现。
Nucleic Acids Res. 2023 Jan 25;51(2):e11. doi: 10.1093/nar/gkac1112.
5
EasyCluster2: an improved tool for clustering and assembling long transcriptome reads.EasyCluster2:一种改进的长转录本读长聚类和组装工具。
BMC Bioinformatics. 2014;15 Suppl 15(Suppl 15):S7. doi: 10.1186/1471-2105-15-S15-S7. Epub 2014 Dec 3.
6
Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.通过对异构体和外显子特异性读段测序率进行建模来改进RNA测序表达估计。
BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.
7
Exploring differential exon usage via short- and long-read RNA sequencing strategies.通过短读长读 RNA 测序策略探索差异外显子使用。
Open Biol. 2022 Sep;12(9):220206. doi: 10.1098/rsob.220206. Epub 2022 Sep 28.
8
CLASS2: accurate and efficient splice variant annotation from RNA-seq reads.类别2:从RNA测序读段中进行准确且高效的剪接变体注释。
Nucleic Acids Res. 2016 Jun 2;44(10):e98. doi: 10.1093/nar/gkw158. Epub 2016 Mar 14.
9
CIDANE: comprehensive isoform discovery and abundance estimation.CIDANE:全面的异构体发现与丰度估计
Genome Biol. 2016 Jan 30;17:16. doi: 10.1186/s13059-015-0865-0.
10
Information transduction capacity reduces the uncertainties in annotation-free isoform discovery and quantification.信息转导能力降低了无注释异构体发现和定量中的不确定性。
Nucleic Acids Res. 2017 Sep 6;45(15):e143. doi: 10.1093/nar/gkx585.

引用本文的文献

1
Utilizing Nanopore direct RNA sequencing of blood from patients with sepsis for discovery of co- and post-transcriptional disease biomarkers.利用脓毒症患者血液的纳米孔直接RNA测序来发现共转录和转录后疾病生物标志物。
BMC Infect Dis. 2025 May 13;25(1):692. doi: 10.1186/s12879-025-11078-z.
2
Cov-trans: an efficient algorithm for discontinuous transcript assembly in coronaviruses.Cov-trans:一种用于冠状病毒中不连续转录本组装的高效算法。
BMC Genomics. 2024 Dec 30;25(1):1257. doi: 10.1186/s12864-024-11179-0.
3
Protocol for transcriptome assembly by the TransBorrow algorithm.通过TransBorrow算法进行转录组组装的方案。
Biol Methods Protoc. 2023 Nov 1;8(1):bpad028. doi: 10.1093/biomethods/bpad028. eCollection 2023.
4
StringFix: an annotation-guided transcriptome assembler improves the recovery of amino acid sequences from RNA-Seq reads.StringFix:一种基于注释指导的转录组组装方法,可提高从 RNA-Seq 读段中恢复氨基酸序列的能力。
Genes Genomics. 2023 Dec;45(12):1599-1609. doi: 10.1007/s13258-023-01458-7. Epub 2023 Oct 14.
5
Counting pseudoalignments to novel splicing events.计算新剪接事件的伪比对。
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad419.
6
Rare Does Not Mean Worthless: How Rare Diseases Have Shaped Neurodevelopment Research in the NGS Era.罕见并不意味着无价值:新一代测序时代的罕见病如何塑造神经发育研究。
Biomolecules. 2021 Nov 17;11(11):1713. doi: 10.3390/biom11111713.
7
TransBorrow: genome-guided transcriptome assembly by borrowing assemblies from different assemblers.TransBorrow:通过从不同的组装器借用组装来进行基因组指导的转录组组装。
Genome Res. 2020 Aug;30(8):1181-1190. doi: 10.1101/gr.257766.119. Epub 2020 Aug 17.
8
Reconstruction of full-length circular RNAs enables isoform-level quantification.全长环状 RNA 的重建可实现异构体水平的定量分析。
Genome Med. 2019 Jan 19;11(1):2. doi: 10.1186/s13073-019-0614-1.
9
TraRECo: a greedy approach based de novo transcriptome assembler with read error correction using consensus matrix.TraRECo:一种基于贪心策略的从头转录组组装方法,使用一致矩阵进行读错误校正。
BMC Genomics. 2018 Sep 4;19(1):653. doi: 10.1186/s12864-018-5034-x.
10
TransComb: genome-guided transcriptome assembly via combing junctions in splicing graphs.TransComb:通过梳理剪接图中的连接点进行基因组引导的转录组组装。
Genome Biol. 2016 Oct 19;17(1):213. doi: 10.1186/s13059-016-1074-1.

本文引用的文献

1
Personalized copy number and segmental duplication maps using next-generation sequencing.使用下一代测序技术构建个性化拷贝数和片段重复图谱。
Nat Genet. 2009 Oct;41(10):1061-7. doi: 10.1038/ng.437. Epub 2009 Aug 30.
2
RNA-MATE: a recursive mapping strategy for high-throughput RNA-sequencing data.RNA-MATE:一种用于高通量 RNA-seq 数据的递归映射策略。
Bioinformatics. 2009 Oct 1;25(19):2615-6. doi: 10.1093/bioinformatics/btp459. Epub 2009 Jul 30.
3
Probabilistic resolution of multi-mapping reads in massively parallel sequencing data using MuMRescueLite.使用 MuMRescueLite 对大规模并行测序数据中的多映射读取进行概率解析。
Bioinformatics. 2009 Oct 1;25(19):2613-4. doi: 10.1093/bioinformatics/btp438. Epub 2009 Jul 15.
4
Next-generation DNA sequencing of paired-end tags (PET) for transcriptome and genome analyses.用于转录组和基因组分析的双末端标签(PET)的下一代DNA测序。
Genome Res. 2009 Apr;19(4):521-32. doi: 10.1101/gr.074906.107.
5
TopHat: discovering splice junctions with RNA-Seq.TopHat:利用RNA测序发现剪接接头
Bioinformatics. 2009 May 1;25(9):1105-11. doi: 10.1093/bioinformatics/btp120. Epub 2009 Mar 16.
6
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.短DNA序列与人类基因组的超快速且内存高效比对。
Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4.
7
Statistical inferences for isoform expression in RNA-Seq.RNA测序中异构体表达的统计推断。
Bioinformatics. 2009 Apr 15;25(8):1026-32. doi: 10.1093/bioinformatics/btp113. Epub 2009 Feb 25.
8
PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data.PEMer:一个基于模拟的错误模型的计算框架,用于从海量的 paired-end 测序数据中推断基因组结构变体。
Genome Biol. 2009 Feb 23;10(2):R23. doi: 10.1186/gb-2009-10-2-r23.
9
Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing.通过大规模平行mRNA测序从头构建真核转录组
Proc Natl Acad Sci U S A. 2009 Mar 3;106(9):3264-9. doi: 10.1073/pnas.0812841106. Epub 2009 Feb 10.
10
Detecting alternative gene structures from spliced ESTs: a computational approach.从剪接的EST中检测可变基因结构:一种计算方法。
J Comput Biol. 2009 Jan;16(1):43-66. doi: 10.1089/cmb.2008.0028.