Chiara Matteo, Horner David S, Spada Alberto
Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Italia.
PLoS One. 2013 Dec 6;8(12):e80961. doi: 10.1371/journal.pone.0080961. eCollection 2013.
De novo transcriptome characterization from Next Generation Sequencing data has become an important approach in the study of non-model plants. Despite notable advances in the assembly of short reads, the clustering of transcripts into unigene-like (locus-specific) clusters remains a somewhat neglected subject. Indeed, closely related paralogous transcripts are often merged into single clusters by current approaches. Here, a novel heuristic method for locus-specific clustering is compared to that implemented in the de novo assembler Oases, using the same initial transcript collections, derived from Arabidopsis thaliana and the developmental model Streptocarpus rexii. We show that the proposed approach improves cluster specificity in the A. thaliana dataset for which the reference genome is available. Furthermore, for the S. rexii data our filtered transcript collection matches a larger number of distinct annotated loci in reference genomes than the Oases set, while containing a reduced overall number of loci. A detailed discussion of advantages and limitations of our approach in processing de novo transcriptome reconstructions is presented. The proposed method should be widely applicable to other organisms, irrespective of the transcript assembly method employed. The S. rexii transcriptome is available as a sophisticated and augmented publicly available online database.
基于新一代测序数据的从头转录组特征分析已成为非模式植物研究中的一种重要方法。尽管在短读段组装方面取得了显著进展,但将转录本聚类成类单基因(基因座特异性)簇仍然是一个有些被忽视的课题。实际上,密切相关的旁系同源转录本通常会被当前方法合并到单个簇中。在此,使用来自拟南芥和发育模型蓝猪耳的相同初始转录本集合,将一种新的基因座特异性聚类启发式方法与从头组装器Oases中实现的方法进行比较。我们表明,对于有参考基因组的拟南芥数据集,所提出的方法提高了簇的特异性。此外,对于蓝猪耳数据,我们经过筛选的转录本集合与参考基因组中更多不同的注释基因座匹配,同时包含的基因座总数减少。本文详细讨论了我们的方法在处理从头转录组重建中的优点和局限性。所提出的方法应广泛适用于其他生物,无论采用何种转录本组装方法。蓝猪耳转录组作为一个复杂且扩充的公共在线数据库可供使用。