模拟基因表达测量误差：一种拟似然方法。

Modeling gene expression measurement error: a quasi-likelihood approach.

作者信息

Strimmer Korbinian

机构信息

Department of Statistics, University of Munich, Ludwigstrasse 33, D-80539 Munich, Germany.

出版信息

BMC Bioinformatics. 2003 Mar 20;4:10. doi: 10.1186/1471-2105-4-10.

DOI:10.1186/1471-2105-4-10

PMID:12659637

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC153502/

Abstract

BACKGROUND

Using suitable error models for gene expression measurements is essential in the statistical analysis of microarray data. However, the true probabilistic model underlying gene expression intensity readings is generally not known. Instead, in currently used approaches some simple parametric model is assumed (usually a transformed normal distribution) or the empirical distribution is estimated. However, both these strategies may not be optimal for gene expression data, as the non-parametric approach ignores known structural information whereas the fully parametric models run the risk of misspecification. A further related problem is the choice of a suitable scale for the model (e.g. observed vs. log-scale).

RESULTS

Here a simple semi-parametric model for gene expression measurement error is presented. In this approach inference is based an approximate likelihood function (the extended quasi-likelihood). Only partial knowledge about the unknown true distribution is required to construct this function. In case of gene expression this information is available in the form of the postulated (e.g. quadratic) variance structure of the data. As the quasi-likelihood behaves (almost) like a proper likelihood, it allows for the estimation of calibration and variance parameters, and it is also straightforward to obtain corresponding approximate confidence intervals. Unlike most other frameworks, it also allows analysis on any preferred scale, i.e. both on the original linear scale as well as on a transformed scale. It can also be employed in regression approaches to model systematic (e.g. array or dye) effects.

CONCLUSIONS

The quasi-likelihood framework provides a simple and versatile approach to analyze gene expression data that does not make any strong distributional assumptions about the underlying error model. For several simulated as well as real data sets it provides a better fit to the data than competing models. In an example it also improved the power of tests to identify differential expression.

摘要

背景

在微阵列数据的统计分析中，使用合适的误差模型对基因表达测量至关重要。然而，基因表达强度读数背后的真实概率模型通常是未知的。相反，在当前使用的方法中，会假设一些简单的参数模型（通常是变换后的正态分布）或估计经验分布。然而，这两种策略对于基因表达数据可能都不是最优的，因为非参数方法忽略了已知的结构信息，而完全参数模型存在误设的风险。另一个相关问题是模型合适尺度的选择（例如观察尺度与对数尺度）。

结果

本文提出了一种用于基因表达测量误差的简单半参数模型。在这种方法中，推断基于近似似然函数（扩展拟似然）。构建此函数仅需要关于未知真实分布的部分知识。对于基因表达而言，这些信息以数据假定的（例如二次）方差结构的形式存在。由于拟似然（几乎）表现得像一个恰当的似然，它允许估计校准和方差参数，并且也很容易获得相应的近似置信区间。与大多数其他框架不同，它还允许在任何首选尺度上进行分析，即在原始线性尺度以及变换尺度上。它也可用于回归方法以对系统（例如阵列或染料）效应进行建模。

结论

拟似然框架提供了一种简单且通用的方法来分析基因表达数据，该方法不对潜在误差模型做任何强分布假设。对于几个模拟数据集以及真实数据集，它比竞争模型能更好地拟合数据。在一个例子中，它还提高了识别差异表达的检验功效。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c4d6/153502/41bf22da7ba5/1471-2105-4-10-1.jpg

相似文献

Modeling gene expression measurement error: a quasi-likelihood approach.

BMC Bioinformatics. 2003 Mar 20;4:10. doi: 10.1186/1471-2105-4-10.

Variance stabilization applied to microarray data calibration and to the quantification of differential expression.

Bioinformatics. 2002;18 Suppl 1:S96-104. doi: 10.1093/bioinformatics/18.suppl_1.s96.

Fully parametric and semi-parametric regression models for common events with covariate measurement error in main study/validation study designs.

Biometrics. 1997 Jun;53(2):395-409.

A variance-stabilizing transformation for gene-expression microarray data.

Bioinformatics. 2002;18 Suppl 1:S105-10. doi: 10.1093/bioinformatics/18.suppl_1.s105.

A spline function approach for detecting differentially expressed genes in microarray data analysis.

Bioinformatics. 2004 Nov 22;20(17):2954-63. doi: 10.1093/bioinformatics/bth339. Epub 2004 Jun 4.

Segmented regression with errors in predictors: semi-parametric and parametric methods.

Stat Med. 1997;16(1-3):169-88. doi: 10.1002/(sici)1097-0258(19970130)16:2<169::aid-sim478>3.0.co;2-m.

β-empirical Bayes inference and model diagnosis of microarray data.

BMC Bioinformatics. 2012 Jun 19;13:135. doi: 10.1186/1471-2105-13-135.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Statistical methods for ranking differentially expressed genes.

Genome Biol. 2003;4(6):R41. doi: 10.1186/gb-2003-4-6-r41. Epub 2003 May 29.

Reliable and efficient parameter estimation using approximate continuum limit descriptions of stochastic models.

J Theor Biol. 2022 Sep 21;549:111201. doi: 10.1016/j.jtbi.2022.111201. Epub 2022 Jun 22.

引用本文的文献

Dioxin Disrupts Dynamic DNA Methylation Patterns in Genes That Govern Cardiomyocyte Maturation.

Toxicol Sci. 2020 Dec 1;178(2):325-337. doi: 10.1093/toxsci/kfaa153.

Inferring transcriptional logic from multiple dynamic experiments.

Bioinformatics. 2017 Nov 1;33(21):3437-3444. doi: 10.1093/bioinformatics/btx407.

Regularized Variance Estimation and Variance Stabilization of High Dimensional Data.

Proc Am Stat Assoc. 2010 Jul-Aug;2010:5295-5309.

Penalized Bregman divergence for large-dimensional regression and classification.

Biometrika. 2010 Sep;97(3):551-566. doi: 10.1093/biomet/asq033. Epub 2010 Jun 30.

Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.

Comput Stat Data Anal. 2012 Jul 1;56(7):2317-2333. doi: 10.1016/j.csda.2012.01.012.

A permutation test for determining significance of clusters with applications to spatial and gene expression data.

Comput Stat Data Anal. 2009 Oct 1;53(12):4290-4300. doi: 10.1016/j.csda.2009.05.031.

The impact of measurement errors in the identification of regulatory networks.

BMC Bioinformatics. 2009 Dec 13;10:412. doi: 10.1186/1471-2105-10-412.

Variance estimation in the analysis of microarray data.

J R Stat Soc Series B Stat Methodol. 2009 Apr 1;71(2):425-445. doi: 10.1111/j.1467-9868.2008.00690.x.

Differential analysis for high density tiling microarray data.

BMC Bioinformatics. 2007 Sep 24;8:359. doi: 10.1186/1471-2105-8-359.

Analysis of host response to bacterial infection using error model based gene expression microarray experiments.

Nucleic Acids Res. 2005 Mar 30;33(6):e53. doi: 10.1093/nar/gni050.

本文引用的文献

Ratio-based decisions and the quantitative analysis of cDNA microarray images.

J Biomed Opt. 1997 Oct;2(4):364-74. doi: 10.1117/12.281504.

Transformation and normalization of oligonucleotide microarray data.

Bioinformatics. 2003 Sep 22;19(14):1817-23. doi: 10.1093/bioinformatics/btg245.

Exploration, normalization, and summaries of high density oligonucleotide array probe level data.

Biostatistics. 2003 Apr;4(2):249-64. doi: 10.1093/biostatistics/4.2.249.

Robust estimators for expression analysis.

Bioinformatics. 2002 Dec;18(12):1585-92. doi: 10.1093/bioinformatics/18.12.1585.

Normalization and analysis of DNA microarray data by self-consistency and local regression.

Genome Biol. 2002 Jun 28;3(7):RESEARCH0037. doi: 10.1186/gb-2002-3-7-research0037.

A variance-stabilizing transformation for gene-expression microarray data.

Bioinformatics. 2002;18 Suppl 1:S105-10. doi: 10.1093/bioinformatics/18.suppl_1.s105.

Variance stabilization applied to microarray data calibration and to the quantification of differential expression.

Bioinformatics. 2002;18 Suppl 1:S96-104. doi: 10.1093/bioinformatics/18.suppl_1.s96.

A model for measurement error for gene expression arrays.

J Comput Biol. 2001;8(6):557-69. doi: 10.1089/106652701753307485.

Sources of nonlinearity in cDNA microarray expression measurements.

Genome Biol. 2001;2(11):RESEARCH0047. doi: 10.1186/gb-2001-2-11-research0047. Epub 2001 Oct 18.

Model-based clustering and data transformations for gene expression data.

Bioinformatics. 2001 Oct;17(10):977-87. doi: 10.1093/bioinformatics/17.10.977.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

模拟基因表达测量误差：一种拟似然方法。

Modeling gene expression measurement error: a quasi-likelihood approach.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献