Sasidharan Rajkumar, Agarwal Ashish, Rozowsky Joel, Gerstein Mark
Molecular Biophysics and Biochemistry Department, Yale University, New Haven, CT 06520, USA.
BMC Res Notes. 2009 Jul 24;2:150. doi: 10.1186/1756-0500-2-150.
There are two main technologies for transcriptome profiling, namely, tiling microarrays and high-throughput sequencing. Recently there has been a tremendous amount of excitement about the latter because of the advent of next-generation sequencing technologies and its promises. Consequently, the question of the moment is how these two technologies compare. Here we attempt to develop an approach to do a fair comparison of transcripts identified from tiling microarray and MPSS sequencing data.
This comparison is a challenging task because the sequencing data is discrete while the tiling array data is continuous. We use the published rice and Arabidopsis datasets which provide currently best matched sets of arrays and sequencing experiments using a slightly earlier generation of sequencing, the MPSS tag sequencing technology. After scoring the arrays consistently in both the organisms, a first pass comparison reveals a surprisingly small overlap in transcripts of 22% and 66% respectively, in rice and Arabidopsis. However, when we do the analysis in detail, we find that this is an underestimate. In particular, when we map the probe intensities onto the sequencing tags and then look at their intensity distribution, we see that they are very similar to exons. Furthermore, restricting our comparison to only protein-coding gene loci revealed a very good overlap between the two technologies.
Our approach to compare genome tiling microarray and MPSS sequencing data suggests that there is actually a reasonable overlap in transcripts identified by the two technologies. This overlap is distorted by the scoring and thresholding in the tiling array scoring procedure.
转录组分析有两种主要技术,即平铺式微阵列和高通量测序。近来,由于新一代测序技术的出现及其前景,人们对后者极为兴奋。因此,当下的问题是这两种技术如何比较。在此,我们尝试开发一种方法,以便对从平铺式微阵列和MPSS测序数据中鉴定出的转录本进行公平比较。
这种比较是一项具有挑战性的任务,因为测序数据是离散的,而平铺式阵列数据是连续的。我们使用已发表的水稻和拟南芥数据集,这些数据集提供了目前使用稍早一代测序技术(MPSS标签测序技术)的最佳匹配的阵列和测序实验。在对两种生物体中的阵列进行一致评分后,初步比较显示,水稻和拟南芥中转录本的重叠率分别低至22% 和66%,令人惊讶。然而,当我们进行详细分析时,发现这是一个低估。特别是,当我们将探针强度映射到测序标签上,然后查看它们的强度分布时,发现它们与外显子非常相似。此外,将我们的比较仅限于蛋白质编码基因座时,发现这两种技术之间有非常好的重叠。
我们比较基因组平铺式微阵列和MPSS测序数据的方法表明,这两种技术鉴定出的转录本实际上有合理的重叠。这种重叠在平铺式阵列评分过程中的评分和阈值设定中被扭曲了。