AIDE：高精度注释辅助的异构体发现。

AIDE: annotation-assisted isoform discovery with high precision.

机构信息

Department of Biostatistics and Epidemiology, Rutgers School of Public Health, Rutgers, The State University of New Jersey, Piscataway, New Jersey 08854, USA.

Department of Statistics, University of California, Los Angeles, California 90095, USA.

出版信息

Genome Res. 2019 Dec;29(12):2056-2072. doi: 10.1101/gr.251108.119. Epub 2019 Nov 6.

DOI:10.1101/gr.251108.119

PMID:31694868

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6886511/

Abstract

Genome-wide accurate identification and quantification of full-length mRNA isoforms is crucial for investigating transcriptional and posttranscriptional regulatory mechanisms of biological phenomena. Despite continuing efforts in developing effective computational tools to identify or assemble full-length mRNA isoforms from second-generation RNA-seq data, it remains a challenge to accurately identify mRNA isoforms from short sequence reads owing to the substantial information loss in RNA-seq experiments. Here, we introduce a novel statistical method, annotation-assisted isoform discovery (AIDE), the first approach that directly controls false isoform discoveries by implementing the testing-based model selection principle. Solving the isoform discovery problem in a stepwise and conservative manner, AIDE prioritizes the annotated isoforms and precisely identifies novel isoforms whose addition significantly improves the explanation of observed RNA-seq reads. We evaluate the performance of AIDE based on multiple simulated and real RNA-seq data sets followed by PCR-Sanger sequencing validation. Our results show that AIDE effectively leverages the annotation information to compensate the information loss owing to short read lengths. AIDE achieves the highest precision in isoform discovery and the lowest error rates in isoform abundance estimation, compared with three state-of-the-art methods Cufflinks, SLIDE, and StringTie. As a robust bioinformatics tool for transcriptome analysis, AIDE enables researchers to discover novel transcripts with high confidence.

摘要

从二代 RNA-seq 数据中识别或组装全长 mRNA 异构体的有效计算工具不断发展，但由于 RNA-seq 实验中存在大量信息丢失，因此仍然难以从短序列读段中准确识别 mRNA 异构体。在这里，我们引入了一种新的统计方法，注释辅助异构体发现（AIDE），这是第一个通过实施基于测试的模型选择原则直接控制假异构体发现的方法。AIDE 逐步和保守地解决异构体发现问题，优先考虑注释异构体，并精确识别新的异构体，这些异构体的添加可以显著提高对观察到的 RNA-seq 读段的解释。我们基于多个模拟和真实的 RNA-seq 数据集评估了 AIDE 的性能，随后进行了 PCR-Sanger 测序验证。我们的结果表明，AIDE 有效地利用了注释信息来补偿由于短读长而导致的信息丢失。与三个最先进的方法 Cufflinks、SLIDE 和 StringTie 相比，AIDE 在异构体发现中具有最高的精度，在异构体丰度估计中的错误率最低。作为转录组分析的强大生物信息学工具，AIDE 使研究人员能够以高置信度发现新的转录本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ccf2/6886511/fa6f80fce2bd/2056f01.jpg

相似文献

AIDE: annotation-assisted isoform discovery with high precision.

Genome Res. 2019 Dec;29(12):2056-2072. doi: 10.1101/gr.251108.119. Epub 2019 Nov 6.

A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing.

BMC Genomics. 2017 May 22;18(1):395. doi: 10.1186/s12864-017-3757-8.

Studying Isoform-Specific mRNA Recruitment to Polyribosomes with Frac-seq.

Methods Mol Biol. 2016;1358:99-108. doi: 10.1007/978-1-4939-3067-8_6.

NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data.

BMC Genomics. 2016 Jan 11;17 Suppl 1(Suppl 1):11. doi: 10.1186/s12864-015-2304-8.

Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation.

Proc Natl Acad Sci U S A. 2011 Dec 13;108(50):19867-72. doi: 10.1073/pnas.1113972108. Epub 2011 Dec 1.

Computational approaches for isoform detection and estimation: good and bad news.

BMC Bioinformatics. 2014 May 9;15:135. doi: 10.1186/1471-2105-15-135.

Freddie: annotation-independent detection and discovery of transcriptomic alternative splicing isoforms using long-read sequencing.

Nucleic Acids Res. 2023 Jan 25;51(2):e11. doi: 10.1093/nar/gkac1112.

Identifying and quantifying isoforms from accurate full-length transcriptome sequencing reads with Mandalorion.

Genome Biol. 2023 Jul 17;24(1):167. doi: 10.1186/s13059-023-02999-6.

Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data.

Bioinformatics. 2014 Feb 15;30(4):506-13. doi: 10.1093/bioinformatics/btt704. Epub 2013 Dec 3.

TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads.

BMC Genomics. 2014;15 Suppl 10(Suppl 10):S5. doi: 10.1186/1471-2164-15-S10-S5. Epub 2014 Dec 12.

引用本文的文献

Transcriptome-wide decoding the roles of aberrant splicing in melanoma MAPK-targeted resistance evolution.

EMBO Rep. 2025 Jul 18. doi: 10.1038/s44319-025-00521-6.

Predicting and comparing transcription start sites in single cell populations.

PLoS Comput Biol. 2025 Apr 3;21(4):e1012878. doi: 10.1371/journal.pcbi.1012878. eCollection 2025.

TDP-43 loss induces extensive cryptic polyadenylation in ALS/FTD.

bioRxiv. 2024 Jan 23:2024.01.22.576625. doi: 10.1101/2024.01.22.576625.

Transcriptomics for Clinical and Experimental Biology Research: Hang on a Seq.

Adv Genet (Hoboken). 2023 Jan 17;4(2):2200024. doi: 10.1002/ggn2.202200024. eCollection 2023 Jun.

Partitioning RNAs by length improves transcriptome reconstruction from short-read RNA-seq data.

Nat Biotechnol. 2022 May;40(5):741-750. doi: 10.1038/s41587-021-01136-7. Epub 2022 Jan 10.

MAAPER: model-based analysis of alternative polyadenylation using 3' end-linked reads.

Genome Biol. 2021 Aug 10;22(1):222. doi: 10.1186/s13059-021-02429-5.

Maternal cecal microbiota transfer rescues early-life antibiotic-induced enhancement of type 1 diabetes in mice.

Cell Host Microbe. 2021 Aug 11;29(8):1249-1265.e9. doi: 10.1016/j.chom.2021.06.014. Epub 2021 Jul 21.

Anti-bias training for (sc)RNA-seq: experimental and computational approaches to improve precision.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab148.

本文引用的文献

Modeling and analysis of RNA-seq data: a review from a statistical perspective.

Quant Biol. 2018 Sep;6(3):195-209. doi: 10.1007/s40484-018-0144-7. Epub 2018 Aug 10.

IDP-denovo: de novo transcriptome assembly and isoform annotation by hybrid sequencing.

Bioinformatics. 2018 Jul 1;34(13):2168-2176. doi: 10.1093/bioinformatics/bty098.

MSIQ: JOINT MODELING OF MULTIPLE RNA-SEQ SAMPLES FOR ACCURATE ISOFORM QUANTIFICATION.

Ann Appl Stat. 2018 Mar;12(1):510-539. doi: 10.1214/17-AOAS1100. Epub 2018 Mar 9.

Ensembl 2018.

Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761. doi: 10.1093/nar/gkx1098.

Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis.

F1000Res. 2017 Feb 3;6:100. doi: 10.12688/f1000research.10571.2. eCollection 2017.

Recurrent Tumor Cell-Intrinsic and -Extrinsic Alterations during MAPKi-Induced Melanoma Regression and Early Adaptation.

Cancer Discov. 2017 Nov;7(11):1248-1265. doi: 10.1158/2159-8290.CD-17-0401. Epub 2017 Sep 1.

Nanopore long-read RNAseq reveals widespread transcriptional variation among the surface receptors of individual B cells.

Nat Commun. 2017 Jul 19;8:16027. doi: 10.1038/ncomms16027.

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

Genome Res. 2017 May;27(5):849-864. doi: 10.1101/gr.213611.116. Epub 2017 Apr 10.

Salmon provides fast and bias-aware quantification of transcript expression.

Nat Methods. 2017 Apr;14(4):417-419. doi: 10.1038/nmeth.4197. Epub 2017 Mar 6.

Modeling of RNA-seq fragment sequence bias reduces systematic errors in transcript abundance estimation.

Nat Biotechnol. 2016 Dec;34(12):1287-1291. doi: 10.1038/nbt.3682. Epub 2016 Sep 26.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

AIDE：高精度注释辅助的异构体发现。

AIDE: annotation-assisted isoform discovery with high precision.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献