Le Huong, Chen Chun, Goudar Chetan T
Drug Substance Technologies, Process Development, Amgen, Inc., 1 Amgen Center Drive, Thousand Oaks, CA, 91320.
Biotechnol Bioeng. 2015 Nov;112(11):2412-6. doi: 10.1002/bit.25649. Epub 2015 Jun 30.
While RNA-Seq is increasingly used as the method of choice for transcriptome analysis of mammalian cell culture processes, no universal genomic reference for mapping RNA-Seq reads from CHO cells has been reported. In previous publications, de novo transcriptomes assembled using these RNA-Seq reads were subsequently used for mapping. Potential caveats with this approach include the incomplete coverage and the non-universal nature of the de novo assemblies, leading to challenges in comparing results across studies. In order to facilitate future RNA-Seq studies in CHO cells, we performed a comprehensive evaluation of four public genomic references for CHO cells hosted by the NCBI Reference Sequence Database (RefSeq), including two annotated genomes released in 2012 and 2014 and their accompanying transcriptomes. Each genome showed significantly higher mapped rates compared to its accompanying transcriptome. Furthermore, higher mapped rates in deep intra-genic regions, especially within exons, were observed for the more recent genome release (2014) compared to the older one (2012), indicating that the 2014 genome was the preeminent reference among the four. Sequential addition of human and mouse genomes increased the total mapped rate to 87.3 and 89.7%, respectively, from 73.5% using the 2014 Chinese hamster genome alone. Thus, the sequential combination of the 2014 RefSeq Chinese hamster genome, the Ensembl human genome (h38), and the Ensembl mouse genome (m38) was suggested as the most effective strategy for mapping RNA-Seq data from CHO cells.
虽然RNA测序越来越多地被用作哺乳动物细胞培养过程转录组分析的首选方法,但尚未有报道称有用于比对中国仓鼠卵巢(CHO)细胞RNA测序读数的通用基因组参考序列。在以往的出版物中,使用这些RNA测序读数组装的从头转录组随后被用于比对。这种方法可能存在的问题包括从头组装的覆盖不完整和非通用性,这导致跨研究比较结果时面临挑战。为了促进未来对CHO细胞的RNA测序研究,我们对美国国立生物技术信息中心(NCBI)参考序列数据库(RefSeq)中托管的四个CHO细胞公共基因组参考序列进行了全面评估,包括2012年和2014年发布的两个注释基因组及其相应的转录组。与各自相应的转录组相比,每个基因组的比对率都显著更高。此外,与旧版本(2012年)相比,在深度基因内区域,尤其是外显子内,较新版本(2014年)的基因组观察到更高的比对率,这表明2014年的基因组是这四个参考序列中最出色的。依次添加人类和小鼠基因组后,总比对率分别从仅使用2014年中国仓鼠基因组时的73.5%提高到了87.3%和89.7%。因此,建议将2014年RefSeq中国仓鼠基因组、Ensembl人类基因组(h38)和Ensembl小鼠基因组(m38)依次组合作为比对CHO细胞RNA测序数据的最有效策略。