Zhao Hongya, Wang Debby D, Chen Long, Liu Xinyu, Yan Hong
Industrial Center, Shenzhen Polytechnic, Shenzhen, China.
Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong.
PLoS One. 2016 Sep 6;11(9):e0162293. doi: 10.1371/journal.pone.0162293. eCollection 2016.
Co-clustering, often called biclustering for two-dimensional data, has found many applications, such as gene expression data analysis and text mining. Nowadays, a variety of multi-dimensional arrays (tensors) frequently occur in data analysis tasks, and co-clustering techniques play a key role in dealing with such datasets. Co-clusters represent coherent patterns and exhibit important properties along all the modes. Development of robust co-clustering techniques is important for the detection and analysis of these patterns. In this paper, a co-clustering method based on hyperplane detection in singular vector spaces (HDSVS) is proposed. Specifically in this method, higher-order singular value decomposition (HOSVD) transforms a tensor into a core part and a singular vector matrix along each mode, whose row vectors can be clustered by a linear grouping algorithm (LGA). Meanwhile, hyperplanar patterns are extracted and successfully supported the identification of multi-dimensional co-clusters. To validate HDSVS, a number of synthetic and biological tensors were adopted. The synthetic tensors attested a favorable performance of this algorithm on noisy or overlapped data. Experiments with gene expression data and lineage data of embryonic cells further verified the reliability of HDSVS to practical problems. Moreover, the detected co-clusters are well consistent with important genetic pathways and gene ontology annotations. Finally, a series of comparisons between HDSVS and state-of-the-art methods on synthetic tensors and a yeast gene expression tensor were implemented, verifying the robust and stable performance of our method.
共聚类,对于二维数据通常称为双聚类,已经有许多应用,比如基因表达数据分析和文本挖掘。如今,各种多维数组(张量)在数据分析任务中频繁出现,并且共聚类技术在处理此类数据集时发挥着关键作用。共聚类表示连贯模式,并且在所有模式上都展现出重要特性。开发强大的共聚类技术对于这些模式的检测和分析很重要。本文提出了一种基于奇异向量空间中超平面检测的共聚类方法(HDSVS)。具体而言,在该方法中,高阶奇异值分解(HOSVD)将一个张量沿着每个模式变换为一个核心部分和一个奇异向量矩阵,其行向量可以通过线性分组算法(LGA)进行聚类。同时,超平面模式被提取出来,并成功支持了多维共聚类的识别。为了验证HDSVS,采用了一些合成张量和生物张量。合成张量证明了该算法在有噪声或重叠数据上具有良好的性能。对胚胎细胞的基因表达数据和谱系数据进行的实验进一步验证了HDSVS对于实际问题的可靠性。此外,检测到的共聚类与重要的遗传途径和基因本体注释非常一致。最后,在合成张量和酵母基因表达张量上对HDSVS与现有方法进行了一系列比较,验证了我们方法的稳健和稳定性能。