Center for Computational Biology of Human Disease, Brown University, Providence, RI, United States of America.
Center for Computation and Visualization, Brown University, Providence, RI, United States of America.
PLoS One. 2021 Jan 12;16(1):e0244202. doi: 10.1371/journal.pone.0244202. eCollection 2021.
A common transcriptome assembly error is to mistake different transcripts of the same gene as transcripts from multiple closely related genes. This error is difficult to identify during assembly, but in a phylogenetic analysis such errors can be diagnosed from gene phylogenies where they appear as clades of tips from the same species with improbably short branch lengths. treeinform is a method that uses phylogenetic information across species to refine transcriptome assemblies within species. It identifies transcripts of the same gene that were incorrectly assigned to multiple genes and reassign them as transcripts of the same gene. The treeinform method is implemented in Agalma, available at https://bitbucket.org/caseywdunn/agalma, and the general approach is relevant in a variety of other contexts.
一种常见的转录组组装错误是将同一基因的不同转录本误认为来自多个密切相关的基因的转录本。这种错误在组装过程中很难识别,但在系统发育分析中,可以从基因系统发育中诊断出这种错误,在系统发育中,它们表现为来自同一物种的尖端分支非常短的分支的聚类。treeinform 是一种利用跨物种的系统发育信息来细化物种内转录组组装的方法。它识别出被错误分配到多个基因的同一基因的转录本,并将它们重新分配为同一基因的转录本。treeinform 方法在 Agalma 中实现,可在 https://bitbucket.org/caseywdunn/agalma 获得,并且该通用方法在各种其他情况下都适用。