Department of Pediatrics, Linda Crnic Institute for Down Syndrome, University of Colorado Denver, Mail Stop 8608, 12700 E. 19th Avenue, Aurora, CO 80045, USA.
Genomics. 2012 Dec;100(6):357-62. doi: 10.1016/j.ygeno.2012.08.004. Epub 2012 Aug 20.
When applied to complex transcript datasets, current tools for automated assembly of mRNA sequences require long run times and produce exponentially increasing numbers of splice variants. Here, we describe RCDA, a genome-based transcript assembly tool comprising RCluster, that recursively clusters transcripts, and DAssemble, that generates composite transcript sequences through path-finding using a directed acyclic graph. Each exon included in a final transcript is associated with an array of all upstream consecutive exon structures obtained from original transcripts. When a depth-first-search path reaches an exon, the path is retained only if it contains a structure from that exon's array. RCDA assemblies, therefore, include only those transcripts with experimentally supported exon patterns. When applied to >23,000 transcripts from human chromosome 21, using biologically reasonable filters, RCDA execution time was approximately 4h. RCDA outperformed ECgene in reconstructing RefSeq transcripts and in limiting the total number of transcripts and transcripts per gene.
当应用于复杂的转录数据集时,当前用于自动组装 mRNA 序列的工具需要较长的运行时间,并产生指数级增长的剪接变体数量。在这里,我们描述了 RCDA,这是一种基于基因组的转录物组装工具,包括 RCluster,它递归地聚类转录物,以及 DAssemble,它通过使用有向无环图进行路径查找生成复合转录物序列。最终转录物中包含的每个外显子都与从原始转录物获得的所有上游连续外显子结构的数组相关联。当深度优先搜索路径到达外显子时,只有在该路径包含该外显子数组中的结构时,才保留该路径。因此,RCDA 组装仅包括具有实验支持的外显子模式的转录物。当应用于来自人类染色体 21 的超过 23,000 个转录物,并使用合理的生物学过滤器时,RCDA 的执行时间约为 4 小时。RCDA 在重建 RefSeq 转录物以及限制转录物总数和每个基因的转录物数量方面优于 ECgene。