Bioinformatics Program, University of Michigan, Ann Arbor, MI 48109, USA.
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S36. doi: 10.1186/1471-2105-12-S1-S36.
Metabolite profiles can be used for identifying molecular signatures and mechanisms underlying diseases since they reflect the outcome of complex upstream genomic, transcriptomic, proteomic and environmental events. The scarcity of publicly accessible large scale metabolome datasets related to human disease has been a major obstacle for assessing the potential of metabolites as biomarkers as well as understanding the molecular events underlying disease-related metabolic changes. The availability of metabolite and gene expression profiles for the NCI-60 cell lines offers the possibility of identifying significant metabolome and transcriptome features and discovering unique molecular processes related to different cancer types.
We utilized a combination of analytical methods in the R statistical package to evaluate metabolic features associated with cancer cell lines from different tissue origins, identify metabolite-gene correlations and detect outliers cell lines based on metabolome and transcriptome data. Statistical analysis results are integrated with metabolic pathway annotations as well as COSMIC and Tumorscape databases to explore associated molecular mechanisms.
Our analysis reveals that although the NCI-60 metabolome dataset is quite noisy comparing with microarray-based transcriptome data, it does contain tissue origin specific signatures. We also identified biologically meaningful gene-metabolite associations. Most remarkably, several abnormal gene-metabolite relationships identified by our approach can be directly linked to known gene mutations and copy number variations in the corresponding cell lines.
Our results suggest that integrative metabolome and transcriptome analysis is a powerful method for understanding molecular machinery underlying various pathophysiological processes. We expect the availability of large scale metabolome data in the coming years will significantly promote the discovery of novel biomarkers, which will in turn improve the understanding of molecular mechanism underlying diseases.
代谢物谱可用于识别疾病的分子特征和机制,因为它们反映了复杂的上游基因组、转录组、蛋白质组和环境事件的结果。缺乏公开的可访问的大规模与人类疾病相关的代谢组数据集一直是评估代谢物作为生物标志物的潜力以及理解疾病相关代谢变化背后的分子事件的主要障碍。NCI-60 细胞系的代谢物和基因表达谱的可用性提供了识别与不同癌症类型相关的重要代谢组和转录组特征以及发现独特分子过程的可能性。
我们利用 R 统计软件包中的组合分析方法来评估来自不同组织来源的癌细胞系的代谢特征,识别与癌症相关的代谢物-基因相关性,并根据代谢组和转录组数据检测异常细胞系。统计分析结果与代谢途径注释以及 COSMIC 和 Tumorscape 数据库集成,以探索相关的分子机制。
我们的分析表明,尽管与基于微阵列的转录组数据相比,NCI-60 代谢组数据集相当嘈杂,但它确实包含组织起源特异性特征。我们还确定了具有生物学意义的基因-代谢物关联。最值得注意的是,我们的方法确定的几个异常基因-代谢物关系可以直接与相应细胞系中的已知基因突变和拷贝数变异相关联。
我们的结果表明,整合代谢组和转录组分析是理解各种生理病理过程背后的分子机制的有力方法。我们预计未来几年大规模代谢组数据的可用性将极大地促进新的生物标志物的发现,这反过来又将提高对疾病背后的分子机制的理解。