Suppr超能文献

一种用于对带有误差信息的基因表达数据进行聚类的方法。

An approach for clustering gene expression data with error information.

作者信息

Tjaden Brian

机构信息

Computer Science Department, Wellesley College, Wellesley, MA 02481, USA.

出版信息

BMC Bioinformatics. 2006 Jan 12;7:17. doi: 10.1186/1471-2105-7-17.

Abstract

BACKGROUND

Clustering of gene expression patterns is a well-studied technique for elucidating trends across large numbers of transcripts and for identifying likely co-regulated genes. Even the best clustering methods, however, are unlikely to provide meaningful results if too much of the data is unreliable. With the maturation of microarray technology, a wealth of research on statistical analysis of gene expression data has encouraged researchers to consider error and uncertainty in their microarray experiments, so that experiments are being performed increasingly with repeat spots per gene per chip and with repeat experiments. One of the challenges is to incorporate the measurement error information into downstream analyses of gene expression data, such as traditional clustering techniques.

RESULTS

In this study, a clustering approach is presented which incorporates both gene expression values and error information about the expression measurements. Using repeat expression measurements, the error of each gene expression measurement in each experiment condition is estimated, and this measurement error information is incorporated directly into the clustering algorithm. The algorithm, CORE (Clustering Of Repeat Expression data), is presented and its performance is validated using statistical measures. By using error information about gene expression measurements, the clustering approach is less sensitive to noise in the underlying data and it is able to achieve more accurate clusterings. Results are described for both synthetic expression data as well as real gene expression data from Escherichia coli and Saccharomyces cerevisiae.

CONCLUSION

The additional information provided by replicate gene expression measurements is a valuable asset in effective clustering. Gene expression profiles with high errors, as determined from repeat measurements, may be unreliable and may associate with different clusters, whereas gene expression profiles with low errors can be clustered with higher specificity. Results indicate that including error information from repeat gene expression measurements can lead to significant improvements in clustering accuracy.

摘要

背景

基因表达模式聚类是一种经过充分研究的技术,用于阐明大量转录本的趋势并识别可能共同调控的基因。然而,如果太多数据不可靠,即使是最好的聚类方法也不太可能提供有意义的结果。随着微阵列技术的成熟,大量关于基因表达数据统计分析的研究促使研究人员在微阵列实验中考虑误差和不确定性,因此每个基因在每个芯片上越来越多地进行重复点样以及进行重复实验。其中一个挑战是将测量误差信息纳入基因表达数据的下游分析,例如传统的聚类技术。

结果

在本研究中,提出了一种聚类方法,该方法同时纳入了基因表达值和关于表达测量的误差信息。利用重复的表达测量,估计每个实验条件下每个基因表达测量的误差,并将此测量误差信息直接纳入聚类算法。提出了CORE(重复表达数据聚类)算法,并使用统计方法验证了其性能。通过使用关于基因表达测量的误差信息,该聚类方法对基础数据中的噪声不太敏感,并且能够实现更准确的聚类。给出了合成表达数据以及来自大肠杆菌和酿酒酵母的真实基因表达数据的结果。

结论

重复基因表达测量提供的额外信息是有效聚类中的一项宝贵资产。根据重复测量确定误差高的基因表达谱可能不可靠,可能与不同的聚类相关联,而误差低的基因表达谱可以更具特异性地聚类。结果表明,纳入重复基因表达测量的误差信息可显著提高聚类准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e0e3/1360687/0de218ea4352/1471-2105-7-17-1.jpg

相似文献

1
An approach for clustering gene expression data with error information.
BMC Bioinformatics. 2006 Jan 12;7:17. doi: 10.1186/1471-2105-7-17.
2
Including probe-level uncertainty in model-based gene expression clustering.
BMC Bioinformatics. 2007 Mar 21;8:98. doi: 10.1186/1471-2105-8-98.
3
Bayesian infinite mixture model based clustering of gene expression profiles.
Bioinformatics. 2002 Sep;18(9):1194-206. doi: 10.1093/bioinformatics/18.9.1194.
4
Clustering of gene expression data: performance and similarity analysis.
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.
7
Comparisons and validation of statistical clustering techniques for microarray gene expression data.
Bioinformatics. 2003 Mar 1;19(4):459-66. doi: 10.1093/bioinformatics/btg025.
8
Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset.
Bioinformatics. 2006 Jul 15;22(14):1737-44. doi: 10.1093/bioinformatics/btl184. Epub 2006 May 18.
9
Clustering gene-expression data with repeated measurements.
Genome Biol. 2003;4(5):R34. doi: 10.1186/gb-2003-4-5-r34. Epub 2003 Apr 25.
10
Simultaneous gene clustering and subset selection for sample classification via MDL.
Bioinformatics. 2003 Jun 12;19(9):1100-9. doi: 10.1093/bioinformatics/btg039.

引用本文的文献

1
Model-Based Clustering with Measurement or Estimation Errors.
Genes (Basel). 2020 Feb 10;11(2):185. doi: 10.3390/genes11020185.
2
Tradeoffs between Dense and Replicate Sampling Strategies for High-Throughput Time Series Experiments.
Cell Syst. 2016 Jul;3(1):35-42. doi: 10.1016/j.cels.2016.06.007. Epub 2016 Jul 21.
3
Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies.
PLoS One. 2015 Aug 24;10(8):e0135918. doi: 10.1371/journal.pone.0135918. eCollection 2015.
4
Interpolation based consensus clustering for gene expression time series.
BMC Bioinformatics. 2015 Apr 16;16:117. doi: 10.1186/s12859-015-0541-0.
5
Clustering gene expression data with a penalized graph-based metric.
BMC Bioinformatics. 2011 Jan 4;12:2. doi: 10.1186/1471-2105-12-2.
9
Clustering of gene expression data based on shape similarity.
EURASIP J Bioinform Syst Biol. 2009;2009(1):195712. doi: 10.1155/2009/195712. Epub 2009 Apr 23.
10
Bioinformatics resources for the study of gene regulation in bacteria.
J Bacteriol. 2009 Jan;191(1):23-31. doi: 10.1128/JB.01017-08. Epub 2008 Oct 31.

本文引用的文献

1
A Study of the Comparability of External Criteria for Hierarchical Cluster Analysis.
Multivariate Behav Res. 1986 Oct 1;21(4):441-58. doi: 10.1207/s15327906mbr2104_5.
2
K-means-type algorithms: a generalized convergence theorem and characterization of local optimality.
IEEE Trans Pattern Anal Mach Intell. 1984 Jan;6(1):81-7. doi: 10.1109/tpami.1984.4767478.
3
Mercer kernel-based clustering in feature space.
IEEE Trans Neural Netw. 2002;13(3):780-4. doi: 10.1109/TNN.2002.1000150.
4
Supervised cluster analysis for microarray data based on multivariate Gaussian mixture.
Bioinformatics. 2004 Aug 12;20(12):1905-13. doi: 10.1093/bioinformatics/bth177. Epub 2004 Mar 25.
5
Bayesian mixture model based clustering of replicated microarray data.
Bioinformatics. 2004 May 22;20(8):1222-32. doi: 10.1093/bioinformatics/bth068. Epub 2004 Feb 10.
6
Boosting for tumor classification with gene expression data.
Bioinformatics. 2003 Jun 12;19(9):1061-9. doi: 10.1093/bioinformatics/btf867.
7
Clustering gene-expression data with repeated measurements.
Genome Biol. 2003;4(5):R34. doi: 10.1186/gb-2003-4-5-r34. Epub 2003 Apr 25.
10
Theoretical and experimental comparisons of gene expression indexes for oligonucleotide arrays.
Bioinformatics. 2002 Nov;18(11):1470-6. doi: 10.1093/bioinformatics/18.11.1470.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验