后基因组数据分析中的计算聚类验证

Computational cluster validation in post-genomic data analysis.

作者信息

Handl Julia, Knowles Joshua, Kell Douglas B

机构信息

School of Chemistry, University of Manchester, Faraday Building, Sackville Street, PO Box 88, Manchester M60 1QD, UK.

出版信息

Bioinformatics. 2005 Aug 1;21(15):3201-12. doi: 10.1093/bioinformatics/bti517. Epub 2005 May 24.

DOI:10.1093/bioinformatics/bti517

PMID:15914541

Abstract

MOTIVATION

The discovery of novel biological knowledge from the ab initio analysis of post-genomic data relies upon the use of unsupervised processing methods, in particular clustering techniques. Much recent research in bioinformatics has therefore been focused on the transfer of clustering methods introduced in other scientific fields and on the development of novel algorithms specifically designed to tackle the challenges posed by post-genomic data. The partitions returned by a clustering algorithm are commonly validated using visual inspection and concordance with prior biological knowledge--whether the clusters actually correspond to the real structure in the data is somewhat less frequently considered. Suitable computational cluster validation techniques are available in the general data-mining literature, but have been given only a fraction of the same attention in bioinformatics.

RESULTS

This review paper aims to familiarize the reader with the battery of techniques available for the validation of clustering results, with a particular focus on their application to post-genomic data analysis. Synthetic and real biological datasets are used to demonstrate the benefits, and also some of the perils, of analytical clustervalidation.

AVAILABILITY

The software used in the experiments is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/.

SUPPLEMENTARY INFORMATION

Enlarged colour plots are provided in the Supplementary Material, which is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/.

摘要

动机

从基因组后数据的从头分析中发现新的生物学知识依赖于无监督处理方法的使用，特别是聚类技术。因此，生物信息学领域最近的许多研究都集中在其他科学领域引入的聚类方法的迁移，以及专门为应对基因组后数据带来的挑战而设计的新算法的开发上。聚类算法返回的划分通常通过目视检查和与先前生物学知识的一致性来验证——而聚类是否真的与数据中的真实结构相对应则较少被考虑。通用数据挖掘文献中提供了合适的计算聚类验证技术，但在生物信息学中受到的关注却少得多。

结果

这篇综述文章旨在让读者熟悉可用于验证聚类结果的一系列技术，特别关注它们在基因组后数据分析中的应用。使用合成和真实生物数据集来展示分析聚类验证的好处以及一些风险。

可用性

实验中使用的软件可在http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/获取。

补充信息

补充材料中提供了放大的彩色图，可在http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

后基因组数据分析中的计算聚类验证

Computational cluster validation in post-genomic data analysis.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY INFORMATION

动机

结果

可用性

补充信息

相似文献

引用本文的文献

后基因组数据分析中的计算聚类验证

Computational cluster validation in post-genomic data analysis.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY INFORMATION

动机

结果

可用性

补充信息

相似文献

引用本文的文献