Tang H, Heeley T, Morlec R, Hubbard S J
Faculty of Life Sciences, The University of Manchester, Manchester, UK.
Cytogenet Genome Res. 2007;117(1-4):268-77. doi: 10.1159/000103188.
Alternate splicing is believed to produce the greatest diversity in transcriptional complexity and function in eukaryotic species. In this study, we present an analysis of alternative splicing events that occur in the chicken, using the recently sequenced genomic sequence and over 580,000 EST sequences mapped back to the genome. A carefully controlled EST-to-genome mapping pipeline is presented, based around the EXONERATE program using the est2genome model, which also considers several quality control steps to filter out erroneous matches. The data is then used to estimate the level of alternate splicing events with respect to Ensembl predicted transcripts. The EST-genome mappings are characterised at the exon level, in order to classify individual splicing events and provide estimates of novel transcripts not currently annotated by the Ensembl genome database. This is the first large scale analysis of this kind in an avian species, and suggests that chicken displays a similar level of alternate splicing as that found in other higher vertebrates such as human and mouse, both in terms of the number of genes that undergo alternate splicing events, and the average number of transcripts produced per gene. The EST data suggests alternate splicing may occur in some 50-60% of the chicken gene set and with an average of around 2.3 transcripts per gene which undergo this process. The EST data is also used to look at gene and transcript usage in the tissues sequenced in embryonic and adult libraries. Genes which display notable biases were analysed in more detail, including twinfilin-2 and embryonic heavy chain myosin. This also highlights several as yet functionally un-annotated genes which appear to be important in embryonic tissues and also undergo alternate splicing events. The analysis also demonstrates some of the difficulties involved in using EST-based data to annotate transcriptional activity in eukaryotic genes, where a broad spectrum of tissues and a large number of sequenced transcripts are required in order to fully characterise alternate splicing and differential expression.
可变剪接被认为在真核生物物种的转录复杂性和功能方面产生了最大的多样性。在本研究中,我们利用最近测序的基因组序列以及超过580,000条映射回基因组的EST序列,对鸡中发生的可变剪接事件进行了分析。我们提出了一个经过精心控制的EST到基因组的映射流程,该流程基于使用est2genome模型的EXONERATE程序,其中还考虑了几个质量控制步骤以过滤掉错误匹配。然后,这些数据被用于估计相对于Ensembl预测转录本的可变剪接事件水平。EST-基因组映射在外显子水平上进行表征,以便对单个剪接事件进行分类,并提供目前未被Ensembl基因组数据库注释的新转录本的估计。这是在鸟类物种中首次进行的此类大规模分析,表明鸡在经历可变剪接事件的基因数量以及每个基因产生的转录本平均数量方面,显示出与人类和小鼠等其他高等脊椎动物相似的可变剪接水平。EST数据表明,约50-60%的鸡基因集可能发生可变剪接,每个经历此过程的基因平均约有2.3个转录本。EST数据还用于研究胚胎和成年文库中测序组织中的基因和转录本使用情况。对表现出显著偏差的基因进行了更详细的分析,包括双肌动蛋白结合蛋白-2和胚胎重链肌球蛋白。这也突出了几个尚未在功能上注释的基因,这些基因似乎在胚胎组织中很重要,并且也经历可变剪接事件。该分析还证明了在使用基于EST的数据注释真核基因中的转录活性时所涉及的一些困难,在这种情况下,需要广泛的组织和大量测序的转录本才能充分表征可变剪接和差异表达。