功能基因组学中用于探索性数据分析的非线性映射

Non-linear mapping for exploratory data analysis in functional genomics.

作者信息

Azuaje Francisco, Wang Haiying, Chesneau Alban

机构信息

School of Computing and Mathematics, University of Ulster, BT37 0QB, UK.

出版信息

BMC Bioinformatics. 2005 Jan 20;6:13. doi: 10.1186/1471-2105-6-13.

DOI:10.1186/1471-2105-6-13

PMID:15661072

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC548129/

Abstract

BACKGROUND

Several supervised and unsupervised learning tools are available to classify functional genomics data. However, relatively less attention has been given to exploratory, visualisation-driven approaches. Such approaches should satisfy the following factors: Support for intuitive cluster visualisation, user-friendly and robust application, computational efficiency and generation of biologically meaningful outcomes. This research assesses a relaxation method for non-linear mapping that addresses these concerns. Its applications to gene expression and protein-protein interaction data analyses are investigated.

RESULTS

Publicly available expression data originating from leukaemia, round blue-cell tumours and Parkinson disease studies were analysed. The method distinguished relevant clusters and critical analysis areas. The system does not require assumptions about the inherent class structure of the data, its mapping process is controlled by only one parameter and the resulting transformations offer intuitive, meaningful visual displays. Comparisons with traditional mapping models are presented. As a way of promoting potential, alternative applications of the methodology presented, an example of exploratory data analysis of interactome networks is illustrated. Data from the C. elegans interactome were analysed. Results suggest that this method might represent an effective solution for detecting key network hubs and for clustering biologically meaningful groups of proteins.

CONCLUSION

A relaxation method for non-linear mapping provided the basis for visualisation-driven analyses using different types of data. This study indicates that such a system may represent a user-friendly and robust approach to exploratory data analysis. It may allow users to gain better insights into the underlying data structure, detect potential outliers and assess assumptions about the cluster composition of the data.

摘要

背景

有几种监督学习和无监督学习工具可用于对功能基因组学数据进行分类。然而，相对较少关注探索性的、以可视化驱动的方法。此类方法应满足以下因素：支持直观的聚类可视化、用户友好且稳健的应用、计算效率以及生成具有生物学意义的结果。本研究评估了一种用于非线性映射的松弛方法，该方法解决了这些问题。研究了其在基因表达和蛋白质 - 蛋白质相互作用数据分析中的应用。

结果

分析了源自白血病、圆形蓝细胞肿瘤和帕金森病研究的公开可用表达数据。该方法区分了相关聚类和关键分析区域。该系统不需要对数据的固有类别结构进行假设，其映射过程仅由一个参数控制，并且所得变换提供直观、有意义的可视化显示。给出了与传统映射模型的比较。作为推广所提出方法的潜在替代应用的一种方式，展示了一个相互作用组网络探索性数据分析的示例。分析了秀丽隐杆线虫相互作用组的数据。结果表明，该方法可能是检测关键网络枢纽和对具有生物学意义的蛋白质组进行聚类的有效解决方案。