Ntasis Vasilis F, Guigó Roderic
Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain.
Department of Experimental and Health Sciences (DCEXS), Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain.
bioRxiv. 2024 Mar 11:2024.03.06.583744. doi: 10.1101/2024.03.06.583744.
The precise coordination of important biological processes, such as differentiation and development, is highly dependent on the regulation of expression of the genetic information. The flow of the genetic information is tightly regulated on multiple levels. Among them, RNA export to cytosol is an essential step for the production of proteins in eukaryotic cells. Hence, estimating the relative concentration of RNA molecules of a given transcript species in the nucleus and in the cytosol is of major significance as it contributes to the understanding of the dynamics of RNA trafficking between the nucleus and the cytosol. The most efficient way to estimate the levels of RNA species genome-wide is through RNA sequencing (RNAseq). While RNAseq can be performed separately in the nucleus and in the cytosol, because measured transcript levels are relative to the total volume of RNA in these compartments, and because this volume is usually unknown, the transcript levels in the nucleus and in the cytosol cannot be directly compared. Here we show theoretically that if, in addition to nuclear and cytosolic RNA-seq, whole cell RNA-seq is also performed, then accurate estimations of the localization of transcripts can be obtained. Based on this, we designed a method that estimates, first the fraction of the total RNA volume in the cytosol (nucleus), and then, this fraction for every transcript. We evaluate our methodology on simulated data and nuclear and cytosolic single cell data available. Finally, we use our method to investigate the cellular localization of transcripts using bulk RNAseq data from the ENCODE project.
重要生物过程(如分化和发育)的精确协调高度依赖于遗传信息表达的调控。遗传信息的流动在多个层面受到严格调控。其中,RNA输出到细胞质是真核细胞中蛋白质产生的关键步骤。因此,估计给定转录本物种的RNA分子在细胞核和细胞质中的相对浓度具有重要意义,因为这有助于理解RNA在细胞核和细胞质之间运输的动态过程。在全基因组范围内估计RNA物种水平的最有效方法是通过RNA测序(RNAseq)。虽然RNAseq可以分别在细胞核和细胞质中进行,但由于测得的转录本水平是相对于这些区室中RNA的总体积,且该体积通常未知,所以细胞核和细胞质中的转录本水平无法直接比较。在这里,我们从理论上表明,如果除了细胞核和细胞质RNA测序外,还进行全细胞RNA测序,那么就可以获得转录本定位的准确估计。基于此,我们设计了一种方法,首先估计细胞质(细胞核)中RNA总体积的比例,然后估计每个转录本的该比例。我们在模拟数据以及现有的细胞核和细胞质单细胞数据上评估了我们的方法。最后,我们使用我们的方法,利用来自ENCODE项目的大量RNAseq数据研究转录本的细胞定位。