Institute of Biomedical Chemistry, Pogodinskaya Street 10, 119121 Moscow, Russia.
Department of Cancer Cell Biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilova 32, 119991 Moscow, Russia.
Int J Mol Sci. 2023 Oct 24;24(21):15502. doi: 10.3390/ijms242115502.
The long-read RNA sequencing developed by Oxford Nanopore Technology provides a direct quantification of transcript isoforms. That makes the number of transcript isoforms per gene an intrinsically suitable metric for alternative splicing (AS) profiling in the application to this particular type of RNA sequencing. By using this simple metric and recruiting principal component analysis (PCA) as a tool to visualize the high-dimensional transcriptomic data, we were able to group biospecimens of normal human liver tissue and hepatocyte-derived malignant HepG2 and Huh7 cells into clear clusters in a 2D space. For the transcriptome-wide analysis, the clustering was observed regardless whether all genes were included in analysis or only those expressed in all biospecimens tested. However, in the application to a particular set of genes known as pharmacogenes, which are involved in drug metabolism, the clustering worsened dramatically in the latter case. Based on PCA data, the subsets of genes most contributing to biospecimens' grouping into clusters were selected and subjected to gene ontology analysis that allowed us to determine the top 20 biological processes among which translation and processes related to its regulation dominate. The suggested metrics can be a useful addition to the existing metrics for describing AS profiles, especially in application to transcriptome studies with long-read sequencing.
牛津纳米孔技术开发的长读 RNA 测序为转录本异构体提供了直接定量。这使得每个基因的转录本异构体数量成为用于这种特定类型的 RNA 测序的可变剪接 (AS) 分析的固有合适指标。通过使用这个简单的指标和主成分分析 (PCA) 作为可视化高维转录组数据的工具,我们能够将正常人类肝脏组织的生物样本和肝细胞来源的恶性 HepG2 和 Huh7 细胞按清晰的聚类分组在二维空间中。对于全转录组分析,无论是否将所有基因都包含在分析中,还是仅将那些在所有测试生物样本中表达的基因包含在分析中,都观察到了聚类。然而,在应用于一组称为药物代谢相关的药物基因时,在后一种情况下,聚类情况显著恶化。基于 PCA 数据,选择了对生物样本聚类分组贡献最大的基因子集,并进行了基因本体分析,使我们能够确定其中排名前 20 的生物学过程,其中翻译和与其相关的调控过程占主导地位。这些建议的指标可以作为描述 AS 谱的现有指标的有用补充,特别是在应用于长读测序的转录组研究中。