Suppr超能文献

一种用于鉴定基因结构和可变剪接变体的比较方法。

A comparative method for identification of gene structures and alternatively spliced variants.

作者信息

Chuang Trees-Juen, Chen Feng-Chi, Chou Meng-Yuan

机构信息

Genomics Research Center, Academia, Sinica, Taipei, Taiwan.

出版信息

Bioinformatics. 2004 Nov 22;20(17):3064-79. doi: 10.1093/bioinformatics/bth368. Epub 2004 Jun 24.

Abstract

MOTIVATION

Alternative splicing (AS) serves as a mechanism to create diversity among functional proteins. Increasing evidence indicates that a large portion of genes have AS forms. Hence AS variants should be considered while analyzing gene structures.

RESULTS

A new cross-species gene identification and AS analysis system, PSEP, has been developed. The system is based on expressed sequence tag (EST)-to-genome and genome-to-genome comparisons and is implemented in two steps: sequence alignment and a series of post-alignment processes, including progressive signal extraction and patching. For gene identification, these post-alignment processes serve as noise filters and enable PSEP to eliminate approximately 88% of potential overprediction. The overall accuracy of PSEP is better than or comparable to that of other well-known cross-species gene prediction programs, including the ROSETTA program, TWINSCAN, SGP-1/-2 and SLAM, when tested on three benchmark datasets (the ELN gene region, the HoxA cluster and the ROSETTA set). In addition, 76.2 and 76.0% of multiple-exon genes in the ROSETTA dataset and human chromosome 20, respectively, are found to have AS forms. Approximately 23% of the 210 elementary alternatives identified in the ROSETTA dataset are not conserved between the human and mouse genomes, and none of the 210 transcripts is found in the RefSeq annotation. With its dual functions in cross-species conserved sequence analysis and AS analysis, PSEP is highly suitable for studying the evolution of AS patterns and for finding unidentified gene expression features.

摘要

动机

可变剪接(AS)是一种在功能蛋白之间产生多样性的机制。越来越多的证据表明,很大一部分基因具有AS形式。因此,在分析基因结构时应考虑AS变体。

结果

开发了一种新的跨物种基因识别和AS分析系统PSEP。该系统基于表达序列标签(EST)与基因组以及基因组与基因组的比较,并分两步实施:序列比对和一系列比对后处理,包括渐进信号提取和拼接。对于基因识别,这些比对后处理充当噪声过滤器,使PSEP能够消除约88%的潜在过度预测。在三个基准数据集(ELN基因区域、HoxA簇和ROSETTA集)上进行测试时,PSEP的总体准确性优于或与其他知名的跨物种基因预测程序相当,包括ROSETTA程序、TWINSCAN、SGP-1/-2和SLAM。此外,分别在ROSETTA数据集中和人类20号染色体上发现76.2%和76.0%的多外显子基因具有AS形式。在ROSETTA数据集中鉴定出的210种基本可变剪接形式中,约23%在人类和小鼠基因组之间不保守,并且在RefSeq注释中未发现这210种转录本中的任何一种。由于PSEP在跨物种保守序列分析和AS分析方面具有双重功能,它非常适合研究AS模式的进化以及发现未识别的基因表达特征。

相似文献

1
A comparative method for identification of gene structures and alternatively spliced variants.
Bioinformatics. 2004 Nov 22;20(17):3064-79. doi: 10.1093/bioinformatics/bth368. Epub 2004 Jun 24.
3
Genome wide identification and classification of alternative splicing based on EST data.
Bioinformatics. 2004 Nov 1;20(16):2579-85. doi: 10.1093/bioinformatics/bth288. Epub 2004 Apr 29.
4
Splicing graphs and EST assembly problem.
Bioinformatics. 2002;18 Suppl 1:S181-8. doi: 10.1093/bioinformatics/18.suppl_1.s181.
5
Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus.
Bioinformatics. 2004 May 1;20(7):1157-69. doi: 10.1093/bioinformatics/bth058. Epub 2004 Feb 5.
6
LongSAGE analysis significantly improves genome annotation: identifications of novel genes and alternative transcripts in the mouse.
Bioinformatics. 2005 Apr 15;21(8):1393-400. doi: 10.1093/bioinformatics/bti207. Epub 2004 Dec 10.
7
Accurate identification of alternatively spliced exons using support vector machine.
Bioinformatics. 2005 Apr 1;21(7):897-901. doi: 10.1093/bioinformatics/bti132. Epub 2004 Nov 5.
9
SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups.
BMC Bioinformatics. 2019 Mar 29;20(Suppl 3):133. doi: 10.1186/s12859-019-2647-2.
10
Selecting for functional alternative splices in ESTs.
Genome Res. 2002 Dec;12(12):1837-45. doi: 10.1101/gr.764102.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验