School of Biological Sciences and Technology, Chonnam National University, Gwangju, 61186, Republic of Korea.
Department of Biomedical Engineering, Chung-Ang University, Seoul, 06974, Republic of Korea.
BMC Bioinformatics. 2021 Oct 21;22(Suppl 11):310. doi: 10.1186/s12859-021-04226-0.
Lately, high-throughput RNA sequencing has been extensively used to elucidate the transcriptome landscape and dynamics of cell types of different species. In particular, for most non-model organisms lacking complete reference genomes with high-quality annotation of genetic information, reference-free (RF) de novo transcriptome analyses, rather than reference-based (RB) approaches, are widely used, and RF analyses have substantially contributed toward understanding the mechanisms regulating key biological processes and functions. To date, numerous bioinformatics studies have been conducted for assessing the workflow, production rate, and completeness of transcriptome assemblies within and between RF and RB datasets. However, the degree of consistency and variability of results obtained by analyzing gene expression levels through these two different approaches have not been adequately documented.
In the present study, we evaluated the differences in expression profiles obtained with RF and RB approaches and revealed that the former tends to be satisfactorily replaced by the latter with respect to transcriptome repertoires, as well as from a gene expression quantification perspective. In addition, we urge cautious interpretation of these findings. Several genes that are lowly expressed, have long coding sequences, or belong to large gene families must be validated carefully, whenever gene expression levels are calculated using the RF method.
Our empirical results indicate important contributions toward addressing transcriptome-related biological questions in non-model organisms.
最近,高通量 RNA 测序已被广泛用于阐明不同物种细胞类型的转录组全景和动态。特别是对于大多数缺乏具有高质量遗传信息注释的完整参考基因组的非模式生物,无参考(RF)从头转录组分析而不是基于参考(RB)的方法被广泛使用,RF 分析大大有助于理解调节关键生物过程和功能的机制。迄今为止,已经进行了许多生物信息学研究,以评估 RF 和 RB 数据集内和数据集之间转录组组装的工作流程、产量和完整性。然而,通过这两种不同方法分析基因表达水平获得的结果的一致性和可变性程度尚未得到充分记录。
在本研究中,我们评估了 RF 和 RB 方法获得的表达谱之间的差异,并揭示了前者在转录组谱方面,以及从基因表达定量的角度来看,前者往往可以被后者令人满意地取代。此外,我们敦促谨慎解释这些发现。无论使用 RF 方法计算基因表达水平时,对于低表达、长编码序列或属于大基因家族的几个基因,都必须仔细验证。
我们的经验结果表明,这对解决非模式生物中转录组相关的生物学问题具有重要贡献。