Suppr超能文献

辅助转录组重建与剪接直系同源关系。

Assisted transcriptome reconstruction and splicing orthology.

作者信息

Blanquart Samuel, Varré Jean-Stéphane, Guertin Paul, Perrin Amandine, Bergeron Anne, Swenson Krister M

机构信息

Inria, Université de Lille, Lille, France.

Université de Lille, CNRS, Centrale Lille, Inria, UMR 9189 - CRIStAL, Lille, France.

出版信息

BMC Genomics. 2016 Nov 11;17(Suppl 10):786. doi: 10.1186/s12864-016-3103-6.

Abstract

BACKGROUND

Transcriptome reconstruction, defined as the identification of all protein isoforms that may be expressed by a gene, is a notably difficult computational task. With real data, the best methods based on RNA-seq data identify barely 21 % of the expressed transcripts. While waiting for algorithms and sequencing techniques to improve - as has been strongly suggested in the literature - it is important to evaluate assisted transcriptome prediction; this is the question of how alternative transcription in one species performs as a predictor of protein isoforms in another relatively close species. Most evidence-based gene predictors use transcripts from other species to annotate a genome, but the predictive power of procedures that use exclusively transcripts from external species has never been quantified. The cornerstone of such an evaluation is the correct identification of pairs of transcripts with the same splicing patterns, called splicing orthologs.

RESULTS

We propose a rigorous procedural definition of splicing orthologs, based on the identification of all ortholog pairs of splicing sites in the nucleotide sequences, and alignments at the protein level. Using our definition, we compared 24 382 human transcripts and 17 909 mouse transcripts from the highly curated CCDS database, and identified 11 122 splicing orthologs. In prediction mode, we show that human transcripts can be used to infer over 62 % of mouse protein isoforms. When restricting the predictions to transcripts known eight years ago, the percentage grows to 74 %. Using CCDS timestamped releases, we also analyze the evolution of the number of splicing orthologs over the last decade.

CONCLUSIONS

Alternative splicing is now recognized to play a major role in the protein diversity of eukaryotic organisms, but definitions of spliced isoform orthologs are still approximate. Here we propose a definition adapted to the subtle variations of conserved alternative splicing sites, and use it to validate numerous accurate orthologous isoform predictions.

摘要

背景

转录组重建,即识别一个基因可能表达的所有蛋白质异构体,是一项极具挑战性的计算任务。对于真实数据,基于RNA测序数据的最佳方法只能识别出不到21%的已表达转录本。正如文献中强烈建议的那样,在等待算法和测序技术改进的同时,评估辅助转录组预测非常重要;这就是一个物种中的可变转录如何作为另一个亲缘关系相对较近的物种中蛋白质异构体预测指标的问题。大多数基于证据的基因预测器使用其他物种的转录本来注释基因组,但仅使用外部物种转录本的方法的预测能力从未被量化。这种评估的基石是正确识别具有相同剪接模式的转录本对,即剪接直系同源物。

结果

我们基于核苷酸序列中剪接位点的所有直系同源对的识别以及蛋白质水平的比对,提出了一个严格的剪接直系同源物的程序定义。使用我们的定义,我们比较了来自高度精选的CCDS数据库中的24382个人类转录本和17909个小鼠转录本,并识别出11122个剪接直系同源物。在预测模式下,我们表明人类转录本可用于推断超过62%的小鼠蛋白质异构体。当将预测限制在八年前已知的转录本时,这一比例增长到74%。使用带有时间戳的CCDS版本,我们还分析了过去十年中剪接直系同源物数量的演变。

结论

可变剪接现在被认为在真核生物的蛋白质多样性中起主要作用,但剪接异构体直系同源物的定义仍然是近似的。在这里,我们提出了一个适用于保守可变剪接位点细微变化的定义,并使用它来验证大量准确的直系同源异构体预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b82/5123294/466fec7688a2/12864_2016_3103_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验