Department of Information Technology, Ghent University - iMinds, Gaston Crommenlaan 8 (201), 9050 Ghent, Belgium.
BMC Bioinformatics. 2014 May 19;15:151. doi: 10.1186/1471-2105-15-151.
Presently, with the increasing number and complexity of available gene expression datasets, the combination of data from multiple microarray studies addressing a similar biological question is gaining importance. The analysis and integration of multiple datasets are expected to yield more reliable and robust results since they are based on a larger number of samples and the effects of the individual study-specific biases are diminished. This is supported by recent studies suggesting that important biological signals are often preserved or enhanced by multiple experiments. An approach to combining data from different experiments is the aggregation of their clusterings into a consensus or representative clustering solution which increases the confidence in the common features of all the datasets and reveals the important differences among them.
We propose a novel generic consensus clustering technique that applies Formal Concept Analysis (FCA) approach for the consolidation and analysis of clustering solutions derived from several microarray datasets. These datasets are initially divided into groups of related experiments with respect to a predefined criterion. Subsequently, a consensus clustering algorithm is applied to each group resulting in a clustering solution per group.These solutions are pooled together and further analysed by employing FCA which allows extracting valuable insights from the data and generating a gene partition over all the experiments. In order to validate the FCA-enhanced approach two consensus clustering algorithms are adapted to incorporate the FCA analysis. Their performance is evaluated on gene expression data from multi-experiment study examining the global cell-cycle control of fission yeast. The FCA results derived from both methods demonstrate that, although both algorithms optimize different clustering characteristics, FCA is able to overcome and diminish these differences and preserve some relevant biological signals.
The proposed FCA-enhanced consensus clustering technique is a general approach to the combination of clustering algorithms with FCA for deriving clustering solutions from multiple gene expression matrices. The experimental results presented herein demonstrate that it is a robust data integration technique able to produce good quality clustering solution that is representative for the whole set of expression matrices.
目前,随着可用基因表达数据集数量和复杂性的增加,结合多个微阵列研究的数据来解决相似的生物学问题变得越来越重要。对多个数据集进行分析和整合有望产生更可靠和稳健的结果,因为它们基于更多的样本,并且个体研究特定偏差的影响会减小。这一点得到了最近的研究的支持,这些研究表明,重要的生物学信号通常通过多个实验得到保留或增强。结合来自不同实验的数据的一种方法是将它们的聚类聚合为共识或代表性聚类解决方案,这增加了对所有数据集的共同特征的置信度,并揭示了它们之间的重要差异。
我们提出了一种新颖的通用共识聚类技术,该技术应用形式概念分析(FCA)方法来整合和分析来自几个微阵列数据集的聚类解决方案。这些数据集最初根据预定义的标准分为与相关实验的组。随后,将共识聚类算法应用于每个组,从而为每个组生成一个聚类解决方案。将这些解决方案汇集在一起,并通过使用 FCA 进一步分析,FCA 允许从数据中提取有价值的见解并生成所有实验的基因分区。为了验证 FCA 增强的方法,我们采用了两种共识聚类算法来纳入 FCA 分析。在一个多实验研究的基因表达数据上评估它们的性能,该研究检查了裂殖酵母的全局细胞周期控制。这两种方法得出的 FCA 结果表明,尽管这两种算法都优化了不同的聚类特征,但 FCA 能够克服和减小这些差异,并保留一些相关的生物学信号。
提出的 FCA 增强的共识聚类技术是一种将聚类算法与 FCA 相结合的通用方法,用于从多个基因表达矩阵中得出聚类解决方案。本文介绍的实验结果表明,它是一种稳健的数据集成技术,能够生成高质量的聚类解决方案,代表整个表达矩阵集。