Suppr超能文献

TranD 中实现的用于量化可变剪接的核苷酸水平距离度量。

Nucleotide-level distance metrics to quantify alternative splicing implemented in TranD.

机构信息

Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL 32611, USA.

University of Florida Genetics Institute, University of Florida, Gainesville, FL 32611, USA.

出版信息

Nucleic Acids Res. 2024 Mar 21;52(5):e28. doi: 10.1093/nar/gkae056.

Abstract

Advances in affordable transcriptome sequencing combined with better exon and gene prediction has motivated many to compare transcription across the tree of life. We develop a mathematical framework to calculate complexity and compare transcript models. Structural features, i.e. intron retention (IR), donor/acceptor site variation, alternative exon cassettes, alternative 5'/3' UTRs, are compared and the distance between transcript models is calculated with nucleotide level precision. All metrics are implemented in a PyPi package, TranD and output can be used to summarize splicing patterns for a transcriptome (1GTF) and between transcriptomes (2GTF). TranD output enables quantitative comparisons between: annotations augmented by empirical RNA-seq data and the original transcript models; transcript model prediction tools for longread RNA-seq (e.g. FLAIR versus Isoseq3); alternate annotations for a species (e.g. RefSeq vs Ensembl); and between closely related species. In C. elegans, Z. mays, D. melanogaster, D. simulans and H. sapiens, alternative exons were observed more frequently in combination with an alternative donor/acceptor than alone. Transcript models in RefSeq and Ensembl are linked and both have unique transcript models with empirical support. D. melanogaster and D. simulans, share many transcript models and long-read RNAseq data suggests that both species are under-annotated. We recommend combined references.

摘要

经济实惠的转录组测序技术的进步,加上更好的外显子和基因预测技术,促使许多人开始比较整个生命之树的转录组。我们开发了一种数学框架来计算复杂性并比较转录本模型。结构特征,如内含子保留(IR)、供体/受体位点变异、可变外显子盒、可变 5'/3'UTR,都进行了比较,并以核苷酸精度计算了转录本模型之间的距离。所有指标都在 PyPi 包 TranD 中实现,输出可用于总结转录组(1GTF)和转录组之间(2GTF)的剪接模式。TranD 输出可实现以下方面的定量比较:用经验 RNA-seq 数据增强注释与原始转录本模型之间的比较;长读 RNA-seq 的转录本模型预测工具(例如 FLAIR 与 Isoseq3)之间的比较;物种的替代注释(例如 RefSeq 与 Ensembl)之间的比较;以及密切相关的物种之间的比较。在秀丽隐杆线虫、玉米、果蝇、拟南芥和人类中,与替代供体/受体结合的替代外显子比单独出现的频率更高。RefSeq 和 Ensembl 中的转录本模型是相关联的,并且都有具有经验支持的独特转录本模型。果蝇和拟南芥有许多共同的转录本模型,长读 RNA-seq 数据表明这两个物种的注释都不足。我们建议联合参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ba1c/10954468/1bb93cb78b64/gkae056figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验