辅助转录组重建与剪接直系同源关系。

Assisted transcriptome reconstruction and splicing orthology.

作者信息

Blanquart Samuel, Varré Jean-Stéphane, Guertin Paul, Perrin Amandine, Bergeron Anne, Swenson Krister M

机构信息

Inria, Université de Lille, Lille, France.

Université de Lille, CNRS, Centrale Lille, Inria, UMR 9189 - CRIStAL, Lille, France.

出版信息

BMC Genomics. 2016 Nov 11;17(Suppl 10):786. doi: 10.1186/s12864-016-3103-6.

DOI:10.1186/s12864-016-3103-6

PMID:28185551

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5123294/

Abstract

BACKGROUND

Transcriptome reconstruction, defined as the identification of all protein isoforms that may be expressed by a gene, is a notably difficult computational task. With real data, the best methods based on RNA-seq data identify barely 21 % of the expressed transcripts. While waiting for algorithms and sequencing techniques to improve - as has been strongly suggested in the literature - it is important to evaluate assisted transcriptome prediction; this is the question of how alternative transcription in one species performs as a predictor of protein isoforms in another relatively close species. Most evidence-based gene predictors use transcripts from other species to annotate a genome, but the predictive power of procedures that use exclusively transcripts from external species has never been quantified. The cornerstone of such an evaluation is the correct identification of pairs of transcripts with the same splicing patterns, called splicing orthologs.

RESULTS

We propose a rigorous procedural definition of splicing orthologs, based on the identification of all ortholog pairs of splicing sites in the nucleotide sequences, and alignments at the protein level. Using our definition, we compared 24 382 human transcripts and 17 909 mouse transcripts from the highly curated CCDS database, and identified 11 122 splicing orthologs. In prediction mode, we show that human transcripts can be used to infer over 62 % of mouse protein isoforms. When restricting the predictions to transcripts known eight years ago, the percentage grows to 74 %. Using CCDS timestamped releases, we also analyze the evolution of the number of splicing orthologs over the last decade.

CONCLUSIONS

Alternative splicing is now recognized to play a major role in the protein diversity of eukaryotic organisms, but definitions of spliced isoform orthologs are still approximate. Here we propose a definition adapted to the subtle variations of conserved alternative splicing sites, and use it to validate numerous accurate orthologous isoform predictions.

摘要

背景

转录组重建，即识别一个基因可能表达的所有蛋白质异构体，是一项极具挑战性的计算任务。对于真实数据，基于RNA测序数据的最佳方法只能识别出不到21%的已表达转录本。正如文献中强烈建议的那样，在等待算法和测序技术改进的同时，评估辅助转录组预测非常重要；这就是一个物种中的可变转录如何作为另一个亲缘关系相对较近的物种中蛋白质异构体预测指标的问题。大多数基于证据的基因预测器使用其他物种的转录本来注释基因组，但仅使用外部物种转录本的方法的预测能力从未被量化。这种评估的基石是正确识别具有相同剪接模式的转录本对，即剪接直系同源物。

结果

我们基于核苷酸序列中剪接位点的所有直系同源对的识别以及蛋白质水平的比对，提出了一个严格的剪接直系同源物的程序定义。使用我们的定义，我们比较了来自高度精选的CCDS数据库中的24382个人类转录本和17909个小鼠转录本，并识别出11122个剪接直系同源物。在预测模式下，我们表明人类转录本可用于推断超过62%的小鼠蛋白质异构体。当将预测限制在八年前已知的转录本时，这一比例增长到74%。使用带有时间戳的CCDS版本，我们还分析了过去十年中剪接直系同源物数量的演变。

结论

可变剪接现在被认为在真核生物的蛋白质多样性中起主要作用，但剪接异构体直系同源物的定义仍然是近似的。在这里，我们提出了一个适用于保守可变剪接位点细微变化的定义，并使用它来验证大量准确的直系同源异构体预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b82/5123294/466fec7688a2/12864_2016_3103_Fig1_HTML.jpg

相似文献

Assisted transcriptome reconstruction and splicing orthology.

BMC Genomics. 2016 Nov 11;17(Suppl 10):786. doi: 10.1186/s12864-016-3103-6.

Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog.

BMC Genomics. 2022 Mar 18;23(1):216. doi: 10.1186/s12864-022-08429-4.

Significant variations in alternative splicing patterns and expression profiles between human-mouse orthologs in early embryos.

Sci China Life Sci. 2017 Feb;60(2):178-188. doi: 10.1007/s11427-015-0348-5. Epub 2016 Jul 4.

Assessment of orthologous splicing isoforms in human and mouse orthologous genes.

BMC Genomics. 2010 Oct 1;11:534. doi: 10.1186/1471-2164-11-534.

Alternatively Spliced Homologous Exons Have Ancient Origins and Are Highly Expressed at the Protein Level.

PLoS Comput Biol. 2015 Jun 10;11(6):e1004325. doi: 10.1371/journal.pcbi.1004325. eCollection 2015 Jun.

Alternative Splicing Signatures in RNA-seq Data: Percent Spliced in (PSI).

Curr Protoc Hum Genet. 2015 Oct 6;87:11.16.1-11.16.14. doi: 10.1002/0471142905.hg1116s87.

Computational Methods and Correlation of Exon-skipping Events with Splicing, Transcription, and Epigenetic Factors.

Methods Mol Biol. 2017;1513:163-170. doi: 10.1007/978-1-4939-6539-7_11.

Comprehensive splicing graph analysis of alternative splicing patterns in chicken, compared to human and mouse.

BMC Genomics. 2009 Jul 7;10 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2164-10-S1-S5.

Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.

BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.

Annotation of Alternatively Spliced Proteins and Transcripts with Protein-Folding Algorithms and Isoform-Level Functional Networks.

Methods Mol Biol. 2017;1558:415-436. doi: 10.1007/978-1-4939-6783-4_20.

引用本文的文献

SimSpliceEvol2: alternative splicing-aware simulation of biological sequence evolution and transcript phylogenies.

BMC Bioinformatics. 2024 Jul 11;25(1):235. doi: 10.1186/s12859-024-05853-z.

Identifying genes with conserved splicing structure and orthologous isoforms in human, mouse and dog.

BMC Genomics. 2022 Mar 18;23(1):216. doi: 10.1186/s12864-022-08429-4.

ExceS-A: an exon-centric split aligner.

J Integr Bioinform. 2022 Mar 7;19(1):20210040. doi: 10.1515/jib-2021-0040.

Insights Into the Albinism Mechanism for Two Distinct Color Morphs of Northern Snakehead, Through Histological and Transcriptome Analyses.

Front Genet. 2020 Sep 18;11:830. doi: 10.3389/fgene.2020.00830. eCollection 2020.

SimSpliceEvol: alternative splicing-aware simulation of biological sequence evolution.

BMC Bioinformatics. 2019 Dec 17;20(Suppl 20):640. doi: 10.1186/s12859-019-3207-5.

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups.

BMC Bioinformatics. 2019 Mar 29;20(Suppl 3):133. doi: 10.1186/s12859-019-2647-2.

本文引用的文献

Ensembl 2016.

Nucleic Acids Res. 2016 Jan 4;44(D1):D710-6. doi: 10.1093/nar/gkv1157. Epub 2015 Dec 19.

Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data.

Bioinformatics. 2015 Dec 15;31(24):3938-45. doi: 10.1093/bioinformatics/btv488. Epub 2015 Sep 3.

A comparative study of RNA-seq analysis strategies.

Brief Bioinform. 2015 Nov;16(6):932-40. doi: 10.1093/bib/bbv007. Epub 2015 Mar 18.

StringTie enables improved reconstruction of a transcriptome from RNA-seq reads.

Nat Biotechnol. 2015 Mar;33(3):290-5. doi: 10.1038/nbt.3122. Epub 2015 Feb 18.

Computational approaches for isoform detection and estimation: good and bad news.

BMC Bioinformatics. 2014 May 9;15:135. doi: 10.1186/1471-2105-15-135.

Current status and new features of the Consensus Coding Sequence database.

Nucleic Acids Res. 2014 Jan;42(Database issue):D865-72. doi: 10.1093/nar/gkt1059. Epub 2013 Nov 11.

Assessment of transcript reconstruction methods for RNA-seq.

Nat Methods. 2013 Dec;10(12):1177-84. doi: 10.1038/nmeth.2714. Epub 2013 Nov 3.

Comparison of RefSeq protein-coding regions in human and vertebrate genomes.

BMC Genomics. 2013 Sep 25;14:654. doi: 10.1186/1471-2164-14-654.

Function of alternative splicing.

Gene. 2013 Feb 1;514(1):1-30. doi: 10.1016/j.gene.2012.07.083. Epub 2012 Aug 15.

Computational methods for transcriptome annotation and quantification using RNA-seq.

Nat Methods. 2011 Jun;8(6):469-77. doi: 10.1038/nmeth.1613. Epub 2011 May 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

辅助转录组重建与剪接直系同源关系。

Assisted transcriptome reconstruction and splicing orthology.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献