Knapp Keith, Chonka Ashley, Chen Yi-Ping Phoebe
Faculty of Science and Technology, Deakin University, Victoria, Australia.
BMC Genomics. 2008 Sep 20;9:428. doi: 10.1186/1471-2164-9-428.
The existence of exons and introns has been known for thirty years. Despite this knowledge, there is a lack of formal research into the categorization of exons. Exon taxonomies used by researchers tend to be selected ad hoc or based on an information poor de-facto standard. Exons have been shown to have specific properties and functions based on among other things their location and order. These factors should play a role in the naming to increase specificity about which exon type(s) are in question.
POEM (Protein Oriented Exon Monikers) is a new taxonomy focused on protein proximal exons. It integrates three dimensions of information (Global Position, Regional Position and Region), thus its exon categories are based on known statistical exon features. POEM is applied to two congruent untranslated exon datasets resulting in the following statistical properties. Using the POEM taxonomy previous wide ranging estimates of initial 5' untranslated region exons are resolved. According to our datasets, 29-36% of genes have wholly untranslated first exons. Untranslated exon containing sequences are shown to have consistently up to 6 times more 5' untranslated exons than 3' untranslated exons. Finally, three exon patterns are determined which account for 70% of untranslated exon genes.
We describe a thorough three-dimensional exon taxonomy called POEM, which is biologically and statistically relevant. No previous taxonomy provides such fine grained information and yet still includes all valid information dimensions. The use of POEM will improve the accuracy of genefinder comparisons and analysis by means of a common taxonomy. It will also facilitate unambiguous communication due to its fine granularity.
外显子和内含子的存在已为人所知达三十年之久。尽管有此认知,但对外显子分类缺乏正式研究。研究人员使用的外显子分类法往往是临时选定的,或基于信息匮乏的实际标准。已表明外显子具有特定属性和功能,这尤其取决于它们的位置和顺序。这些因素应在命名中发挥作用,以提高所讨论外显子类型的特异性。
POEM(面向蛋白质的外显子命名)是一种专注于靠近蛋白质的外显子的新分类法。它整合了三个信息维度(全局位置、区域位置和区域),因此其外显子类别基于已知的统计外显子特征。POEM应用于两个一致的非翻译外显子数据集,得出以下统计属性。使用POEM分类法解决了先前对初始5'非翻译区外显子的广泛估计。根据我们的数据集,29% - 36%的基因具有完全非翻译的首个外显子。含非翻译外显子的序列显示,其5'非翻译外显子始终比3'非翻译外显子多6倍。最后,确定了三种外显子模式,它们占非翻译外显子基因的70%。
我们描述了一种全面的三维外显子分类法POEM,它在生物学和统计学上具有相关性。以前没有分类法能提供如此精细的信息,同时还涵盖所有有效的信息维度。POEM的使用将通过通用分类法提高基因查找器比较和分析的准确性。由于其精细的粒度,它还将促进明确的交流。