Rasche Axel, Lienhard Matthias, Yaspo Marie-Laure, Lehrach Hans, Herwig Ralf
Max-Planck-Institute for Molecular Genetics, Department of Vertebrate Genomics, Ihnestrasse 63-73, 14195 Berlin, Germany
Max-Planck-Institute for Molecular Genetics, Department of Vertebrate Genomics, Ihnestrasse 63-73, 14195 Berlin, Germany.
Nucleic Acids Res. 2014 Aug;42(14):e110. doi: 10.1093/nar/gku495. Epub 2014 Jun 11.
The computational prediction of alternative splicing from high-throughput sequencing data is inherently difficult and necessitates robust statistical measures because the differential splicing signal is overlaid by influencing factors such as gene expression differences and simultaneous expression of multiple isoforms amongst others. In this work we describe ARH-seq, a discovery tool for differential splicing in case-control studies that is based on the information-theoretic concept of entropy. ARH-seq works on high-throughput sequencing data and is an extension of the ARH method that was originally developed for exon microarrays. We show that the method has inherent features, such as independence of transcript exon number and independence of differential expression, what makes it particularly suited for detecting alternative splicing events from sequencing data. In order to test and validate our workflow we challenged it with publicly available sequencing data derived from human tissues and conducted a comparison with eight alternative computational methods. In order to judge the performance of the different methods we constructed a benchmark data set of true positive splicing events across different tissues agglomerated from public databases and show that ARH-seq is an accurate, computationally fast and high-performing method for detecting differential splicing events.
从高通量测序数据中进行可变剪接的计算预测本质上具有挑战性,需要强大的统计方法,因为差异剪接信号会被诸如基因表达差异和多种异构体同时表达等影响因素所叠加。在这项工作中,我们描述了ARH-seq,这是一种在病例对照研究中用于差异剪接的发现工具,它基于信息论中的熵概念。ARH-seq作用于高通量测序数据,是最初为外显子微阵列开发的ARH方法的扩展。我们表明该方法具有一些固有特性,例如与转录本外显子数量无关以及与差异表达无关,这使得它特别适合从测序数据中检测可变剪接事件。为了测试和验证我们的工作流程,我们用来自人类组织的公开可用测序数据对其进行了挑战,并与八种其他计算方法进行了比较。为了评判不同方法的性能,我们构建了一个从公共数据库汇总的跨不同组织的真阳性剪接事件基准数据集,并表明ARH-seq是一种用于检测差异剪接事件的准确、计算速度快且性能高的方法。