Fagan Ailís, Culhane Aedín C, Higgins Desmond G
Conway Institute for Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin, Ireland.
Proteomics. 2007 Jun;7(13):2162-71. doi: 10.1002/pmic.200600898.
In order to understand even the simplest cellular processes, we need to integrate proteomic, gene expression and other biomolecular data. To date, most computational approaches aimed at integrating proteomics and gene expression data use direct gene/protein correlation measures. However, due to post-transcriptional and translational regulations, the correspondence between the expression of a gene and its protein is complicated. We apply a multivariate statistical method, co-inertia analysis (CIA), to visualise gene and proteomic expression data stemming from the same biological samples. Principal components analysis or correspondence analysis can be used for data exploration on single datasets. CIA is then used to explore the relationships between two or more datasets. We further explore the data by projecting gene ontology (GO) information onto these plots to describe the cellular processes in action. We apply these techniques to gene expression and protein abundance data from studies of the human malarial parasite life cycle and the NCI-60 cancer cell lines. In each case, we visualise gene expression, protein abundance and GO classes in the same low dimensional projections and identify GO classes that are likely to be of biological importance.
为了理解哪怕是最简单的细胞过程,我们需要整合蛋白质组学、基因表达及其他生物分子数据。迄今为止,大多数旨在整合蛋白质组学和基因表达数据的计算方法都使用直接的基因/蛋白质相关性度量。然而,由于转录后和翻译调控,基因表达与其蛋白质之间的对应关系很复杂。我们应用一种多元统计方法——共惯性分析(CIA),来可视化源自相同生物样本的基因和蛋白质组表达数据。主成分分析或对应分析可用于对单个数据集进行数据探索。然后使用CIA来探索两个或更多数据集之间的关系。我们通过将基因本体(GO)信息投影到这些图上进一步探索数据,以描述正在进行的细胞过程。我们将这些技术应用于来自人类疟原虫生命周期研究和NCI - 60癌细胞系的基因表达和蛋白质丰度数据。在每种情况下,我们在相同的低维投影中可视化基因表达、蛋白质丰度和GO类别,并识别可能具有生物学重要性的GO类别。