Düz Elif, Çakır Tunahan
Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, 41400, Turkey.
Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, 41400, Turkey.
Comput Biol Chem. 2024 Apr;109:108028. doi: 10.1016/j.compbiolchem.2024.108028. Epub 2024 Feb 8.
High throughput RNA sequencing brings new perspective to the elucidation of molecular mechanisms of diseases. Normalization is the first and most important step for RNA-Seq data, and it can differ based on the purpose of the analysis. Within-sample normalization methods (eg. TPM) are preferred when genes in a sample are compared with each other, and between-sample normalization methods (eg. deseq2, TMM, Voom) are used when the samples in a dataset are compared. Normalization approaches rescale the data, and, therefore, they affect the results of the analysis. Here, we selected two most commonly used Alzheimer's disease RNA-Seq datasets from ROSMAP and Mayo Clinic cohorts and mapped the differentially expressed genes on human protein interactome to discover disease-specific subnetworks. To this end, the raw count data were first processed with four different, commonly used RNA-Seq normalization methods (deseq2, TMM, Voom and TPM). Then, covariate adjustment was applied to the normalized data for gender, age of death and post-mortem interval. Each normalized dataset was separately mapped on the human protein-protein interaction network either in covariate-adjusted or non-adjusted form. Capturing known Alzheimer's disease genes and genes associated with the disease-related functional terms in the discovered subnetworks were the criteria to compare different normalization methods. Based on our results, applying covariate adjustment has a positive effect on normalization by removing the confounder effects. Covariate-adjusted TMM and covariate-adjusted deseq2 methods performed better in both transcriptome datasets.
高通量RNA测序为阐明疾病的分子机制带来了新视角。标准化是RNA测序数据的首要且最重要的步骤,并且它会因分析目的而异。当比较样本中的基因时,样本内标准化方法(例如TPM)是首选;而当比较数据集中的样本时,则使用样本间标准化方法(例如deseq2、TMM、Voom)。标准化方法会重新调整数据的比例,因此,它们会影响分析结果。在这里,我们从ROSMAP和梅奥诊所队列中选择了两个最常用的阿尔茨海默病RNA测序数据集,并将差异表达基因映射到人类蛋白质相互作用组上,以发现疾病特异性子网。为此,首先使用四种不同的常用RNA测序标准化方法(deseq2、TMM、Voom和TPM)对原始计数数据进行处理。然后,对标准化数据进行性别、死亡年龄和死后间隔的协变量调整。每个标准化数据集分别以协变量调整或未调整的形式映射到人类蛋白质-蛋白质相互作用网络上。在发现的子网中捕获已知的阿尔茨海默病基因以及与疾病相关功能术语相关的基因是比较不同标准化方法的标准。根据我们的结果,应用协变量调整通过消除混杂效应,对标准化有积极影响。协变量调整后的TMM和协变量调整后的deseq2方法在两个转录组数据集中表现更好。