Gogolewski Krzysztof, Kostecki Marcin, Gambin Anna
Institute of Informatics, University of Warsaw, 02-097 Warsaw, Poland.
Entropy (Basel). 2020 Oct 31;22(11):1238. doi: 10.3390/e22111238.
The constantly and rapidly increasing amount of the biological data gained from many different high-throughput experiments opens up new possibilities for data- and model-driven inference. Yet, alongside, emerges a problem of risks related to data integration techniques. The latter are not so widely taken account of. Especially, the approaches based on the flux balance analysis (FBA) are sensitive to the structure of a metabolic network for which the low-entropy clusters can prevent the inference from the activity of the metabolic reactions. In the following article, we set forth problems that may arise during the integration of metabolomic data with gene expression datasets. We analyze common pitfalls, provide their possible solutions, and exemplify them by a case study of the renal cell carcinoma (RCC). Using the proposed approach we provide a metabolic description of the known morphological RCC subtypes and suggest a possible existence of the poor-prognosis cluster of patients, which are commonly characterized by the low activity of the drug transporting enzymes crucial in the chemotherapy. This discovery suits and extends the already known poor-prognosis characteristics of RCC. Finally, the goal of this work is also to point out the problem that arises from the integration of high-throughput data with the inherently nonuniform, manually curated low-throughput data. In such cases, the over-represented information may potentially overshadow the non-trivial discoveries.
从许多不同的高通量实验中不断快速增加的生物数据量为数据驱动和模型驱动的推理开辟了新的可能性。然而,与此同时,出现了与数据整合技术相关的风险问题。后者并未得到广泛关注。特别是,基于通量平衡分析(FBA)的方法对代谢网络的结构敏感,对于这种网络,低熵簇可能会妨碍从代谢反应活性进行的推理。在接下来的文章中,我们阐述了在代谢组学数据与基因表达数据集整合过程中可能出现的问题。我们分析了常见的陷阱,提供了可能的解决方案,并通过肾细胞癌(RCC)的案例研究进行了例证。使用所提出的方法,我们提供了已知形态学RCC亚型的代谢描述,并表明可能存在预后不良的患者簇,这些患者通常以化疗中关键的药物转运酶活性低为特征。这一发现符合并扩展了RCC已知的预后不良特征。最后,这项工作的目标还在于指出高通量数据与本质上不均匀、人工整理的低通量数据整合时出现的问题。在这种情况下,过度呈现的信息可能会潜在地掩盖重要的发现。