Suppr超能文献

具有重复和异常值测量的微阵列时间序列数据的贝叶斯层次聚类。

Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements.

机构信息

Systems Biology Centre, University of Warwick, Coventry, UK.

出版信息

BMC Bioinformatics. 2011 Oct 13;12:399. doi: 10.1186/1471-2105-12-399.

Abstract

BACKGROUND

Post-genomic molecular biology has resulted in an explosion of data, providing measurements for large numbers of genes, proteins and metabolites. Time series experiments have become increasingly common, necessitating the development of novel analysis tools that capture the resulting data structure. Outlier measurements at one or more time points present a significant challenge, while potentially valuable replicate information is often ignored by existing techniques.

RESULTS

We present a generative model-based Bayesian hierarchical clustering algorithm for microarray time series that employs Gaussian process regression to capture the structure of the data. By using a mixture model likelihood, our method permits a small proportion of the data to be modelled as outlier measurements, and adopts an empirical Bayes approach which uses replicate observations to inform a prior distribution of the noise variance. The method automatically learns the optimum number of clusters and can incorporate non-uniformly sampled time points. Using a wide variety of experimental data sets, we show that our algorithm consistently yields higher quality and more biologically meaningful clusters than current state-of-the-art methodologies. We highlight the importance of modelling outlier values by demonstrating that noisy genes can be grouped with other genes of similar biological function. We demonstrate the importance of including replicate information, which we find enables the discrimination of additional distinct expression profiles.

CONCLUSIONS

By incorporating outlier measurements and replicate values, this clustering algorithm for time series microarray data provides a step towards a better treatment of the noise inherent in measurements from high-throughput genomic technologies. Timeseries BHC is available as part of the R package 'BHC' (version 1.5), which is available for download from Bioconductor (version 2.9 and above) via http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all.

摘要

背景

后基因组分子生物学导致了数据的爆炸式增长,为大量基因、蛋白质和代谢物提供了测量值。时间序列实验变得越来越普遍,需要开发新的分析工具来捕捉由此产生的数据结构。一个或多个时间点的异常测量值是一个重大挑战,而现有技术通常忽略了潜在有价值的重复信息。

结果

我们提出了一种基于生成模型的贝叶斯层次聚类算法,用于微阵列时间序列,该算法使用高斯过程回归来捕获数据的结构。通过使用混合模型似然,我们的方法允许一小部分数据被建模为异常测量值,并采用经验贝叶斯方法,使用重复观测来为噪声方差的先验分布提供信息。该方法自动学习最佳的聚类数量,并可以包含非均匀采样的时间点。使用各种各样的实验数据集,我们表明,我们的算法始终产生比当前最先进的方法更高质量和更具生物学意义的聚类。我们通过演示嘈杂的基因可以与具有相似生物学功能的其他基因一起分组,强调了对异常值建模的重要性。我们还强调了包含重复信息的重要性,发现这可以区分其他不同的表达谱。

结论

通过纳入异常测量值和重复值,这种时间序列微阵列数据的聚类算法为更好地处理高通量基因组技术测量值中固有的噪声迈出了一步。Timeseries BHC 作为 R 包“BHC”(版本 1.5)的一部分提供,可从 Bioconductor(版本 2.9 及以上)通过 http://www.bioconductor.org/packages/release/bioc/html/BHC.html?pagewanted=all 下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9931/3228548/99be2e2abfce/1471-2105-12-399-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验