Chen Feng-Chi, Chuang Trees-Juen
Genomics Research Center, Academia Sinica, Academia Road, Nankang, Taipei 11529, Taiwan.
BMC Bioinformatics. 2006 May 19;7:259. doi: 10.1186/1471-2105-7-259.
The evolution of alternatively spliced exons (ASEs) is of primary interest because these exons are suggested to be a major source of functional diversity of proteins. Many exon features have been suggested to affect the evolution of ASEs. However, previous studies have relied on the KA/KS ratio test without taking into consideration information sufficiency (i.e., exon length > 75 bp, cross-species divergence > 5%) of the studied exons, leading to potentially biased interpretations. Furthermore, which exon feature dominates the results of the KA/KS ratio test and whether multiple exon features have additive effects have remained unexplored.
In this study, we collect two different datasets for analysis - the ASE dataset (which includes lineage-specific ASEs and conserved ASEs) and the ACE dataset (which includes only conserved ASEs). We first show that information sufficiency can significantly affect the interpretation of relationship between exons features and the KA/KS ratio test results. After discarding exons with insufficient information, we use a Boolean method to analyze the relationship between test results and four exon features (namely length, protein domain overlapping, inclusion level, and exonic splicing enhancer (ESE) frequency) for the ASE dataset. We demonstrate that length and protein domain overlapping are dominant factors, and they have similar impacts on test results of ASEs. In addition, despite the weak impacts of inclusion level and ESE motif frequency when considered individually, combination of these two factors still have minor additive effects on test results. However, the ACE dataset shows a slightly different result in that inclusion level has a marginally significant effect on test results. Lineage-specific ASEs may have contributed to the difference. Overall, in both ASEs and ACEs, protein domain overlapping is the most dominant exon feature while ESE frequency is the weakest one in affecting test results.
The proposed method can easily find additive effects of individual or multiple factors on the KA/KS ratio test results of exons. Therefore, the system can analyze complex conditions in evolution where multiple features are involved. More factors can also be added into the system to extend the scope of evolutionary analysis of exons. In addition, our method may be useful when orthologous exons can not be found for the KA/KS ratio test.
可变剪接外显子(ASEs)的进化备受关注,因为这些外显子被认为是蛋白质功能多样性的主要来源。许多外显子特征被认为会影响ASEs的进化。然而,以往的研究依赖于KA/KS比率检验,却未考虑所研究外显子的信息充分性(即外显子长度>75 bp,跨物种差异>5%),从而导致潜在的偏差解释。此外,哪种外显子特征主导KA/KS比率检验的结果以及多个外显子特征是否具有累加效应仍未得到探索。
在本研究中,我们收集了两个不同的数据集进行分析——ASE数据集(包括谱系特异性ASEs和保守ASEs)和ACE数据集(仅包括保守ASEs)。我们首先表明,信息充分性会显著影响对外显子特征与KA/KS比率检验结果之间关系的解释。在剔除信息不足的外显子后,我们使用布尔方法分析ASE数据集中检验结果与四个外显子特征(即长度、蛋白质结构域重叠、包含水平和外显子剪接增强子(ESE)频率)之间的关系。我们证明长度和蛋白质结构域重叠是主导因素,并且它们对ASEs的检验结果有相似的影响。此外,尽管单独考虑时包含水平和ESE基序频率的影响较弱,但这两个因素的组合对检验结果仍有轻微的累加效应。然而,ACE数据集显示出略有不同的结果,即包含水平对检验结果有微弱的显著影响。谱系特异性ASEs可能导致了这种差异。总体而言,在ASEs和ACEs中,蛋白质结构域重叠是影响检验结果的最主要外显子特征,而ESE频率是最微弱的一个。
所提出的方法能够轻松找到单个或多个因素对外显子KA/KS比率检验结果的累加效应。因此,该系统可以分析涉及多个特征的复杂进化条件。还可以向系统中添加更多因素以扩展外显子进化分析的范围。此外,当无法找到直系同源外显子进行KA/KS比率检验时,我们的方法可能会很有用。