精确的错误发现率和错误发现比例方差的积分公式。

Exact Integral Formulas for False Discovery Rate and the Variance of False Discovery Proportion.

机构信息

Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, 301 University Blvd, Galveston, Texas 77555, United States.

出版信息

J Proteome Res. 2024 Jun 7;23(6):2298-2305. doi: 10.1021/acs.jproteome.3c00842. Epub 2024 May 29.

DOI:10.1021/acs.jproteome.3c00842

PMID:38809146

Abstract

Multiple hypothesis testing is an integral component of data analysis for large-scale technologies such as proteomics, transcriptomics, or metabolomics, for which the false discovery rate (FDR) and positive FDR (pFDR) have been accepted as error estimation and control measures. The pFDR is the expectation of false discovery proportion (FDP), which refers to the ratio of the number of null hypotheses to that of all rejected hypotheses. In practice, the expectation of ratio is approximated by the ratio of expectation; however, the conditions for transforming the former into the latter have not been investigated. This work derives exact integral expressions for the expectation (pFDR) and variance of FDP. The widely used approximation (ratio of expectations) is shown to be a particular case (in the limit of a large sample size) of the integral formula for pFDR. A recurrence formula is provided to compute the pFDR for a predefined number of null hypotheses. The variance of FDP was approximated for a practical application in peptide identification using forward and reversed protein sequences. The simulations demonstrate that the integral expression exhibits better accuracy than the approximate formula in the case of a small number of hypotheses. For large sample sizes, the pFDRs obtained by the integral expression and approximation do not differ substantially. Applications to proteomics data sets are included.

摘要

多假设检验是蛋白质组学、转录组学或代谢组学等大规模技术数据分析的一个组成部分，假发现率（FDR）和阳性 FDR（pFDR）已被接受为错误估计和控制措施。pFDR 是错误发现比例（FDP）的期望，它是指零假设数与所有拒绝假设数的比值。在实践中，通过期望的比值来近似期望的比值；然而，将前者转化为后者的条件尚未得到研究。这项工作推导出了 FDP 的期望（pFDR）和方差的精确积分表达式。广泛使用的近似值（期望的比值）是 pFDR 积分公式的一个特例（在大样本量的极限情况下）。提供了一个递归公式来计算预定数量的零假设的 pFDR。使用正向和反向蛋白质序列在肽鉴定的实际应用中对 FDP 的方差进行了近似。模拟结果表明，在假设数量较少的情况下，积分表达式比近似公式具有更好的准确性。对于大样本量，积分表达式和近似得到的 pFDR 没有显著差异。包括对蛋白质组学数据集的应用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

精确的错误发现率和错误发现比例方差的积分公式。

Exact Integral Formulas for False Discovery Rate and the Variance of False Discovery Proportion.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

精确的错误发现率和错误发现比例方差的积分公式。

Exact Integral Formulas for False Discovery Rate and the Variance of False Discovery Proportion.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献