Sun Peng, Guo Jiong, Baumbach Jan
Computational Systems Biology Group, Max Planck Institute for Informatics, Campus E1.4, 66123 Saarbrücken, Germany.
J Integr Bioinform. 2012 Jul 17;9(2):197. doi: 10.2390/biecoll-jib-2012-197.
The explosion of biological data has largely influenced the focus of today’s biology research. Integrating and analysing large quantity of data to provide meaningful insights has become the main challenge to biologists and bioinformaticians. One major problem is the combined data analysis of data from different types, such as phenotypes and genotypes. This data is modelled as bi-partite graphs where nodes correspond to the different data points, mutations and diseases for instance, and weighted edges relate to associations between them. Bi-clustering is a special case of clustering designed for partitioning two different types of data simultaneously. We present a bi-clustering approach that solves the NP-hard weighted bi-cluster editing problem by transforming a given bi-partite graph into a disjoint union of bi-cliques. Here we contribute with an exact algorithm that is based on fixed-parameter tractability. We evaluated its performance on artificial graphs first. Afterwards we exemplarily applied our Java implementation to data of genome-wide association studies (GWAS) data aiming for discovering new, previously unobserved geno-to-pheno associations. We believe that our results will serve as guidelines for further wet lab investigations. Generally our software can be applied to any kind of data that can be modelled as bi-partite graphs. To our knowledge it is the fastest exact method for weighted bi-cluster editing problem.
生物数据的爆炸式增长在很大程度上影响了当今生物学研究的重点。整合和分析大量数据以提供有意义的见解已成为生物学家和生物信息学家面临的主要挑战。一个主要问题是对来自不同类型的数据进行联合数据分析,例如表型和基因型。这种数据被建模为二分图,其中节点对应于不同的数据点,例如突变和疾病,加权边与它们之间的关联相关。双聚类是一种特殊的聚类,旨在同时对两种不同类型的数据进行划分。我们提出了一种双聚类方法,通过将给定的二分图转换为双团的不相交并集来解决NP难的加权双聚类编辑问题。这里我们提供了一种基于固定参数可处理性的精确算法。我们首先在人工图上评估了它的性能。之后,我们以示例方式将我们的Java实现应用于全基因组关联研究(GWAS)数据,旨在发现新的、以前未观察到的基因到表型的关联。我们相信我们的结果将为进一步的湿实验室研究提供指导。一般来说,我们的软件可以应用于任何可以建模为二分图的数据。据我们所知,它是解决加权双聚类编辑问题最快的精确方法。