Suppr超能文献

基于经验贝叶斯的微阵列数据分析推断和模型诊断。

β-empirical Bayes inference and model diagnosis of microarray data.

机构信息

Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo 113-8657, Japan.

出版信息

BMC Bioinformatics. 2012 Jun 19;13:135. doi: 10.1186/1471-2105-13-135.

Abstract

BACKGROUND

Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level; however, he data presents a challenging statistical problem because of the large number of transcripts with small sample sizes that are obtained. To reduce the dimensionality, various Bayesian or empirical Bayes hierarchical models have been developed. However, because of the complexity of the microarray data, no model can explain the data fully. It is generally difficult to scrutinize the irregular patterns of expression that are not expected by the usual statistical gene by gene models.

RESULTS

As an extension of empirical Bayes (EB) procedures, we have developed the β-empirical Bayes (β-EB) approach based on a β-likelihood measure which can be regarded as an 'evidence-based' weighted (quasi-) likelihood inference. The weight of a transcript t is described as a power function of its likelihood, fβ(yt|θ). Genes with low likelihoods have unexpected expression patterns and low weights. By assigning low weights to outliers, the inference becomes robust. The value of β, which controls the balance between the robustness and efficiency, is selected by maximizing the predictive β₀-likelihood by cross-validation. The proposed β-EB approach identified six significant (p<10⁻⁵) contaminated transcripts as differentially expressed (DE) in normal/tumor tissues from the head and neck of cancer patients. These six genes were all confirmed to be related to cancer; they were not identified as DE genes by the classical EB approach. When applied to the eQTL analysis of Arabidopsis thaliana, the proposed β-EB approach identified some potential master regulators that were missed by the EB approach.

CONCLUSIONS

The simulation data and real gene expression data showed that the proposed β-EB method was robust against outliers. The distribution of the weights was used to scrutinize the irregular patterns of expression and diagnose the model statistically. When β-weights outside the range of the predicted distribution were observed, a detailed inspection of the data was carried out. The β-weights described here can be applied to other likelihood-based statistical models for diagnosis, and may serve as a useful tool for transcriptome and proteome studies.

摘要

背景

微阵列数据使我们能够在基因组水平上高通量地检测 mRNA 表达谱; 然而,由于获得的转录本数量众多,样本量小,数据呈现出具有挑战性的统计问题。为了降低维度,已经开发了各种贝叶斯或经验贝叶斯层次模型。然而,由于微阵列数据的复杂性,没有模型可以完全解释数据。通常很难仔细检查通常的基因统计模型所不期望的表达模式。

结果

作为经验贝叶斯 (EB) 过程的扩展,我们基于β似然度量开发了β-经验贝叶斯 (β-EB) 方法,该方法可以看作是一种“基于证据”的加权(准)似然推理。转录本 t 的权重描述为其似然性 fβ(yt|θ) 的幂函数。具有低似然度的基因具有意外的表达模式和低权重。通过向异常值分配低权重,推理变得稳健。通过交叉验证最大化预测β₀似然度来选择控制稳健性和效率之间平衡的β值。所提出的β-EB 方法确定了 6 个显著的(p<10⁻⁵)污染转录本作为头颈癌患者正常/肿瘤组织中的差异表达(DE)。这 6 个基因都被证实与癌症有关;它们没有被经典 EB 方法鉴定为 DE 基因。当应用于拟南芥的 eQTL 分析时,所提出的β-EB 方法鉴定了一些被 EB 方法遗漏的潜在主调控因子。

结论

模拟数据和真实基因表达数据表明,所提出的β-EB 方法对异常值具有稳健性。权重的分布用于仔细检查表达的不规则模式并从统计学上诊断模型。当观察到预测分布范围之外的β权重时,对数据进行详细检查。此处描述的β权重可应用于其他基于似然的统计模型进行诊断,并且可以作为转录组和蛋白质组研究的有用工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0f9/3464654/cd666f9a31b2/1471-2105-13-135-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验