Department of Bioengineering, Gebze Technical University, Kocaeli, 41400, Turkey.
NPJ Syst Biol Appl. 2024 Oct 24;10(1):124. doi: 10.1038/s41540-024-00448-z.
Genome-scale metabolic models (GEMs) cover the entire list of metabolic genes in an organism and associated reactions, in a tissue/condition non-specific manner. RNA-seq provides crucial information to make the GEMs condition-specific. Integrative Metabolic Analysis Tool (iMAT) and Integrative Network Inference for Tissues (INIT) are the two most popular algorithms to create condition-specific GEMs from human transcriptome data. The normalization method of choice for raw RNA-seq count data affects the model content produced by these algorithms and their predictive accuracy. However, a benchmark of the RNA-seq normalization methods on the performance of iMAT and INIT algorithms is missing in the literature. Another important phenomenon is covariates such as age and gender in a dataset, and they can affect the predictivity of analysis. In this study, we aimed to compare five different RNA-seq data normalization methods (TPM, FPKM, TMM, GeTMM, and RLE) and covariate adjusted versions of the normalized data by mapping them on a human GEM using the iMAT and INIT algorithms to generate personalized metabolic models. We used RNA-seq data for Alzheimer's disease (AD) and lung adenocarcinoma (LUAD) patients. The results demonstrated that RNA-seq data normalized by the RLE, TMM, or GeTMM methods enabled the production of condition-specific metabolic models with considerably low variability in terms of the number of active reactions compared to the within-sample normalization methods (FPKM, TPM). Using these models, we could more accurately capture the disease-associated genes (average accuracy of ~0.80 for AD and ~0.67 for LUAD) for the RLE, TMM, and GeTMM normalization methods. An increase in the accuracies was observed for all the methods when covariate adjustment was applied. We found a similar accuracy trend when we compared the metabolites of perturbed reactions to metabolome data for AD. Together, our benchmark study shows that the between-sample RNA-seq normalization methods reduce false positive predictions at the expense of missing some true positive genes when mapped on GEMs.
基因组规模代谢模型(GEM)以组织/条件非特异性的方式涵盖生物体中整个代谢基因列表和相关反应。RNA-seq 提供了使 GEM 具有条件特异性的关键信息。整合代谢分析工具(iMAT)和组织综合网络推断(INIT)是从人类转录组数据创建条件特异性 GEM 的两种最流行的算法。用于原始 RNA-seq 计数数据的归一化方法会影响这些算法产生的模型内容及其预测准确性。然而,在文献中缺少关于 RNA-seq 归一化方法对 iMAT 和 INIT 算法性能影响的基准测试。数据集中的协变量(如年龄和性别)是另一个重要现象,它们会影响分析的预测能力。在这项研究中,我们旨在比较五种不同的 RNA-seq 数据归一化方法(TPM、FPKM、TMM、GeTMM 和 RLE)以及通过将其映射到人类 GEM 上来调整协变量的归一化数据的版本,使用 iMAT 和 INIT 算法生成个性化代谢模型。我们使用阿尔茨海默病(AD)和肺腺癌(LUAD)患者的 RNA-seq 数据。结果表明,与内样本归一化方法(FPKM、TPM)相比,使用 RLE、TMM 或 GeTMM 方法归一化的 RNA-seq 数据可生成具有相当低变异性的条件特异性代谢模型,与活性反应数量相比。使用这些模型,我们可以更准确地捕获与疾病相关的基因(AD 的平均准确性约为 0.80,LUAD 的平均准确性约为 0.67)。当应用协变量调整时,所有方法的准确性都有所提高。当我们将扰动反应的代谢物与 AD 的代谢组数据进行比较时,我们发现了类似的准确性趋势。总的来说,我们的基准研究表明,在将 GEM 映射时,样本间 RNA-seq 归一化方法以牺牲一些真正的阳性基因为代价,减少了假阳性预测。