Suppr超能文献

使用零膨胀离散混合分布的经验零值估计及其在蛋白质结构域数据中的应用。

Empirical null estimation using zero-inflated discrete mixture distributions and its application to protein domain data.

作者信息

Gauran Iris Ivy M, Park Junyong, Lim Johan, Park DoHwan, Zylstra John, Peterson Thomas, Kann Maricel, Spouge John L

机构信息

Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Maryland 21250, U.S.A.

School of Statistics, University of the Philippines Diliman, Quezon City, 1101, Philippines.

出版信息

Biometrics. 2018 Jun;74(2):458-471. doi: 10.1111/biom.12779. Epub 2017 Sep 22.

Abstract

In recent mutation studies, analyses based on protein domain positions are gaining popularity over gene-centric approaches since the latter have limitations in considering the functional context that the position of the mutation provides. This presents a large-scale simultaneous inference problem, with hundreds of hypothesis tests to consider at the same time. This article aims to select significant mutation counts while controlling a given level of Type I error via False Discovery Rate (FDR) procedures. One main assumption is that the mutation counts follow a zero-inflated model in order to account for the true zeros in the count model and the excess zeros. The class of models considered is the Zero-inflated Generalized Poisson (ZIGP) distribution. Furthermore, we assumed that there exists a cut-off value such that smaller counts than this value are generated from the null distribution. We present several data-dependent methods to determine the cut-off value. We also consider a two-stage procedure based on screening process so that the number of mutations exceeding a certain value should be considered as significant mutations. Simulated and protein domain data sets are used to illustrate this procedure in estimation of the empirical null using a mixture of discrete distributions. Overall, while maintaining control of the FDR, the proposed two-stage testing procedure has superior empirical power.

摘要

在最近的突变研究中,基于蛋白质结构域位置的分析比以基因为中心的方法更受欢迎,因为后者在考虑突变位置所提供的功能背景方面存在局限性。这带来了一个大规模的同时推断问题,需要同时考虑数百个假设检验。本文旨在通过错误发现率(FDR)程序在控制给定水平的I型错误的同时选择显著的突变计数。一个主要假设是突变计数遵循零膨胀模型,以便解释计数模型中的真实零值和过多的零值。所考虑的模型类别是零膨胀广义泊松(ZIGP)分布。此外,我们假设存在一个截止值,使得小于该值的计数是由零分布产生的。我们提出了几种依赖数据的方法来确定截止值。我们还考虑了一种基于筛选过程的两阶段程序,以便将超过某个值的突变数量视为显著突变。使用离散分布的混合,通过模拟和蛋白质结构域数据集来说明该程序在估计经验零值方面的应用。总体而言,在保持对FDR的控制的同时,所提出的两阶段测试程序具有优越的经验功效。

相似文献

10
Semiparametric analysis of zero-inflated count data.零膨胀计数数据的半参数分析
Biometrics. 2006 Dec;62(4):996-1003. doi: 10.1111/j.1541-0420.2006.00575.x.

引用本文的文献

本文引用的文献

1
Signal transduction in cancer.癌症中的信号转导
Cold Spring Harb Perspect Med. 2015 Apr 1;5(4):a006098. doi: 10.1101/cshperspect.a006098.
4
Domain landscapes of somatic mutations in cancer.癌症体细胞突变的域景观。
BMC Genomics. 2012 Jun 18;13 Suppl 4(Suppl 4):S9. doi: 10.1186/1471-2164-13-S4-S9.
6
Objective method for estimating asymptotic parameters, with an application to sequence alignment.估计渐近参数的客观方法及其在序列比对中的应用。
Phys Rev E Stat Nonlin Soft Matter Phys. 2011 Sep;84(3 Pt 1):031914. doi: 10.1103/PhysRevE.84.031914. Epub 2011 Sep 13.
7
DMDM: domain mapping of disease mutations.DMDM:疾病突变的域映射。
Bioinformatics. 2010 Oct 1;26(19):2458-9. doi: 10.1093/bioinformatics/btq447. Epub 2010 Aug 4.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验