Banting and Best Department of Medical Research, University of Toronto, ON, Canada.
Bioinformatics. 2010 Jun 15;26(12):i325-33. doi: 10.1093/bioinformatics/btq200.
Transcripts from approximately 95% of human multi-exon genes are subject to alternative splicing (AS). The growing interest in AS is propelled by its prominent contribution to transcriptome and proteome complexity and the role of aberrant AS in numerous diseases. Recent technological advances enable thousands of exons to be simultaneously profiled across diverse cell types and cellular conditions, but require accurate identification of condition-specific splicing changes. It is necessary to accurately identify such splicing changes to elucidate the underlying regulatory programs or link the splicing changes to specific diseases.
We present a probabilistic model tailored for high-throughput AS data, where observed isoform levels are explained as combinations of condition-specific AS signals. According to our formulation, given an AS dataset our tasks are to detect common signals in the data and identify the exons relevant to each signal. Our model can incorporate prior knowledge about underlying AS signals, measurement quality and gene expression level effects. Using a large-scale multi-tissue AS dataset, we demonstrate the advantage of our method over standard alternative approaches. In addition, we describe newly found tissue-specific AS signals which were verified experimentally, and discuss associated regulatory features.
Supplementary data are available at Bioinformatics online.
约 95%的人类多外显子基因的转录本都受到可变剪接(AS)的影响。AS 越来越受到关注,是因为它对转录组和蛋白质组的复杂性有显著贡献,并且在许多疾病中异常的 AS 也起到了作用。最近的技术进步使我们能够在不同的细胞类型和细胞条件下同时对数千个外显子进行分析,但需要准确识别特定条件下的剪接变化。准确识别这些剪接变化对于阐明潜在的调控程序或将剪接变化与特定疾病联系起来是必要的。
我们提出了一个针对高通量 AS 数据的概率模型,其中观察到的异构体水平被解释为条件特异性 AS 信号的组合。根据我们的公式,给定一个 AS 数据集,我们的任务是检测数据中的常见信号,并识别与每个信号相关的外显子。我们的模型可以结合潜在的 AS 信号、测量质量和基因表达水平效应等先验知识。使用大规模的多组织 AS 数据集,我们证明了我们的方法优于标准的替代方法的优势。此外,我们还描述了新发现的组织特异性 AS 信号,这些信号已通过实验验证,并讨论了相关的调控特征。
补充数据可在“Bioinformatics”在线获取。