Loraine Ann E, Helt Gregg A, Cline Melissa S, Siani-Rose Michael A
Bioinformatics Department, Affymetrix, 6550 Vallejo St, Emeryville, CA 94530 USA.
J Bioinform Comput Biol. 2003 Jul;1(2):289-306. doi: 10.1142/s0219720003000113.
Understanding how alternative splicing affects gene function is an important challenge facing modern-day molecular biology. Using homology-based, protein sequence analysis methods, it should be possible to investigate how transcript diversity impacts protein function. To test this, high-quality exon-intron structures were deduced for over 8000 human genes, including over 1300 (17 percent) that produce multiple transcript variants. A data mining technique (DiffMotif) was developed to identify genes in which transcript variation coincides with changes in conserved motifs between variants. Applying this method, we found that 30 percent of the multi-variant genes in our test set exhibited a differential profile of conserved InterPro and/or BLOCKS motifs across different mRNA variants. To investigate these, a visualization tool (ProtAnnot) that displays amino acid motifs in the context of genomic sequence was developed. Using this tool, genes revealed by the DiffMotif method were analyzed, and when possible, hypotheses regarding the potential role of alternative transcript structure in modulating gene function were developed. Examples of these, including: MEOX1, a homeobox-containing protein; AIRE, involved in auto-immune disease; PLAT, tissue type plasminogen activator; and CD79b, a component of the B-cell receptor complex, are presented. These results demonstrate that amino acid motif databases like BLOCKS and InterPro are useful tools for investigating how alternative transcript structure affects gene function.
理解可变剪接如何影响基因功能是现代分子生物学面临的一项重要挑战。利用基于同源性的蛋白质序列分析方法,应该能够研究转录本多样性如何影响蛋白质功能。为了验证这一点,我们推导了8000多个人类基因的高质量外显子-内含子结构,其中包括1300多个(17%)产生多种转录变体的基因。我们开发了一种数据挖掘技术(DiffMotif)来识别转录本变异与变体之间保守基序变化相吻合的基因。应用该方法,我们发现测试集中30%的多变体基因在不同的mRNA变体中表现出保守的InterPro和/或BLOCKS基序的差异分布。为了研究这些基因,我们开发了一种可视化工具(ProtAnnot),该工具可在基因组序列背景下显示氨基酸基序。利用该工具,我们分析了通过DiffMotif方法揭示的基因,并尽可能提出了关于可变转录本结构在调节基因功能中潜在作用的假设。本文给出了这些基因的实例,包括:含同源框蛋白MEOX1;与自身免疫疾病相关的AIRE;组织型纤溶酶原激活剂PLAT;以及B细胞受体复合物的一个组分CD79b。这些结果表明,像BLOCKS和InterPro这样的氨基酸基序数据库是研究可变转录本结构如何影响基因功能的有用工具。