Nam Kyoungwoo, Jeong Heesu, Nam Jin-Wu
Department of Life Science, College of Natural Sciences, Hanyang University, Seoul 04763, Korea.
College of Liberal Studies, Seoul National University, Seoul 08826, Korea.
Genes (Basel). 2016 Feb 24;7(3):10. doi: 10.3390/genes7030010.
High-throughput RNA sequencing (RNA-seq) provides a comprehensive picture of the transcriptome, including the identity, structure, quantity, and variability of expressed transcripts in cells, through the assembly of sequenced short RNA-seq reads. Although the reference-based approach guarantees the high quality of the resulting transcriptome, this approach is only applicable when the relevant reference genome is present. Here, we developed a pseudo-reference-based assembly (PRA) that reconstructs a transcriptome based on a linear regression function of the optimized mapping parameters and genetic distances of the closest species. Using the linear model, we reconstructed transcriptomes of four different aves, the white leg horn, turkey, duck, and zebra finch, with the Gallus gallus genome as a pseudo-reference, and of three primates, the chimpanzee, gorilla, and macaque, with the human genome as a pseudo-reference. The resulting transcriptomes show that the PRAs outperformed the de novo approach for species with within about 10% mutation rate among orthologous transcriptomes, enough to cover distantly related species as far as chicken and duck. Taken together, we suggest that the PRA method can be used as a tool for reconstructing transcriptome maps of vertebrates whose genomes have not yet been sequenced.
高通量RNA测序(RNA-seq)通过对测序得到的短RNA-seq reads进行组装,提供了转录组的全面图谱,包括细胞中表达转录本的身份、结构、数量和变异性。尽管基于参考基因组的方法保证了所得转录组的高质量,但这种方法仅在存在相关参考基因组时适用。在此,我们开发了一种基于伪参考基因组的组装方法(PRA),该方法基于优化的映射参数和最接近物种的遗传距离的线性回归函数来重建转录组。使用线性模型,我们以原鸡基因组作为伪参考基因组,重建了四种不同鸟类(白来航鸡、火鸡、鸭和斑胸草雀)的转录组,以及以人类基因组作为伪参考基因组,重建了三种灵长类动物(黑猩猩、大猩猩和猕猴)的转录组。所得转录组表明,对于直系同源转录组中突变率在约10%以内的物种,PRA方法优于从头组装方法,足以覆盖远缘物种,如鸡和鸭。综上所述,我们建议PRA方法可作为一种工具,用于重建尚未进行基因组测序的脊椎动物的转录组图谱。