Bradnam Keith R, Korf Ian
Genome Center, University of California Davis, Davis, California, USA.
PLoS One. 2008 Aug 29;3(8):e3093. doi: 10.1371/journal.pone.0003093.
While many properties of eukaryotic gene structure are well characterized, differences in the form and function of introns that occur at different positions within a transcript are less well understood. In particular, the dynamics of intron length variation with respect to intron position has received relatively little attention. This study analyzes all available data on intron lengths in GenBank and finds a significant trend of increased length in first introns throughout a wide range of species. This trend was found to be even stronger when using high-confidence gene annotation data for three model organisms (Arabidopsis thaliana, Caenorhabditis elegans, and Drosophila melanogaster) which show that the first intron in the 5' UTR is--on average--significantly longer than all downstream introns within a gene. A partial explanation for increased first intron length in A. thaliana is suggested by the increased frequency of certain motifs that are present in first introns. The phenomenon of longer first introns can potentially be used to improve gene prediction software and also to detect errors in existing gene annotations.
虽然真核基因结构的许多特性已得到充分表征,但转录本内不同位置的内含子在形式和功能上的差异却了解较少。特别是,内含子长度相对于内含子位置的变化动态受到的关注相对较少。本研究分析了GenBank中所有可用的内含子长度数据,发现在广泛的物种中,首个内含子的长度有显著增加的趋势。当使用三种模式生物(拟南芥、秀丽隐杆线虫和黑腹果蝇)的高可信度基因注释数据时,这一趋势更为明显,这些数据表明5'UTR中的首个内含子平均而言明显长于基因内的所有下游内含子。拟南芥首个内含子长度增加的部分原因是首个内含子中某些基序出现频率的增加。首个内含子较长的现象有可能用于改进基因预测软件,也可用于检测现有基因注释中的错误。