Sacchi Lucia, Bellazzi Riccardo, Larizza Cristiana, Magni Paolo, Curk Tomaz, Petrovic Uros, Zupan Blaz
Dipartimento di Informatica e Sistemistica, Università di Pavia, via Ferrata 1, 27100 Pavia, Italy.
Int J Med Inform. 2005 Aug;74(7-8):505-17. doi: 10.1016/j.ijmedinf.2005.03.014.
This paper describes a new technique for clustering short time series of gene expression data. The technique is a generalization of the template-based clustering and is based on a qualitative representation of profiles which are labelled using trend Temporal Abstractions (TAs); clusters are then dynamically identified on the basis of this qualitative representation. Clustering is performed in an efficient way at three different levels of aggregation of qualitative labels, each level corresponding to a distinct degree of qualitative representation. The developed TA-clustering algorithm provides an innovative way to cluster gene profiles. We show the developed method to be robust, efficient and to perform better than the standard hierarchical agglomerative clustering approach when dealing with temporal dislocations of time series. Results of the TA-clustering algorithm can be visualized as a three-level hierarchical tree of qualitative representations and as such easy to interpret. We demonstrate the utility of the proposed algorithm on a set of two simulated data sets and on a study of gene expression data from S. cerevisiae.
本文描述了一种用于对基因表达数据的短时间序列进行聚类的新技术。该技术是基于模板的聚类的推广,并且基于使用趋势时间抽象(TA)进行标记的轮廓的定性表示;然后基于这种定性表示动态地识别聚类。聚类在定性标签的三个不同聚合级别上以高效的方式进行,每个级别对应于不同程度的定性表示。所开发的TA聚类算法提供了一种对基因轮廓进行聚类的创新方法。我们表明,所开发的方法在处理时间序列的时间错位时具有鲁棒性、高效性,并且比标准的层次凝聚聚类方法表现更好。TA聚类算法的结果可以可视化为定性表示的三级层次树,因此易于解释。我们在一组两个模拟数据集以及对酿酒酵母基因表达数据的研究中证明了所提出算法的实用性。