基因表达数据中的转移和缩放模式。

Shifting and scaling patterns from gene expression data.

作者信息

Aguilar-Ruiz Jesús S

机构信息

BIGS BioInformatics Group Seville, University of Seville, Pablo de Olavide University, Spain.

出版信息

Bioinformatics. 2005 Oct 15;21(20):3840-5. doi: 10.1093/bioinformatics/bti641. Epub 2005 Sep 6.

DOI:10.1093/bioinformatics/bti641

PMID:16144809

Abstract

MOTIVATION

During the last years, the discovering of biclusters in data is becoming more and more popular. Biclustering aims at extracting a set of clusters, each of which might use a different subset of attributes. Therefore, it is clear that the usefulness of biclustering techniques is beyond the traditional clustering techniques, especially when datasets present high or very high dimensionality. Also, biclustering considers overlapping, which is an interesting aspect, algorithmically and from the point of view of the result interpretation. Since the Cheng and Church's works, the mean squared residue has turned into one of the most popular measures to search for biclusters, which ideally should discover shifting and scaling patterns.

RESULTS

In this work, we identify both types of patterns (shifting and scaling) and demonstrate that the mean squared residue is very useful to search for shifting patterns, but it is not appropriate to find scaling patterns because even when we find a perfect scaling pattern the mean squared residue is not zero. In addition, we provide an interesting result: the mean squared residue is highly dependent on the variance of the scaling factor, which makes possible that any algorithm based on this measure might not find these patterns in data when the variance of gene values is high. The main contribution of this paper is to prove that the mean squared residue is not precise enough from the mathematical point of view in order to discover shifting and scaling patterns at the same time.

CONTACT

aguilar@lsi.us.es.

摘要

动机

在过去几年中，数据中双聚类的发现越来越流行。双聚类旨在提取一组聚类，其中每个聚类可能使用不同的属性子集。因此，很明显双聚类技术的实用性超出了传统聚类技术，特别是当数据集呈现高维或非常高维时。此外，双聚类考虑重叠，这在算法上以及从结果解释的角度来看都是一个有趣的方面。自程和丘奇的工作以来，均方残差已成为搜索双聚类最流行的度量之一，理想情况下它应该发现平移和缩放模式。

结果

在这项工作中，我们识别了这两种类型的模式（平移和缩放），并证明均方残差对于搜索平移模式非常有用，但不适用于寻找缩放模式，因为即使我们找到了完美的缩放模式，均方残差也不为零。此外，我们提供了一个有趣的结果：均方残差高度依赖于缩放因子的方差，这使得基于此度量的任何算法在基因值方差较高时可能无法在数据中找到这些模式。本文的主要贡献是从数学角度证明均方残差不够精确，无法同时发现平移和缩放模式。

联系方式

aguilar@lsi.us.es 。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基因表达数据中的转移和缩放模式。

Shifting and scaling patterns from gene expression data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

CONTACT

动机

结果

联系方式

相似文献

引用本文的文献

基因表达数据中的转移和缩放模式。

Shifting and scaling patterns from gene expression data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

CONTACT

动机

结果

联系方式

相似文献

引用本文的文献