Suppr超能文献

自助法:一种数据驱动统计的技术。利用计算机密集型分析来探索实验数据。

The bootstrap: a technique for data-driven statistics. Using computer-intensive analyses to explore experimental data.

作者信息

Henderson A Ralph

机构信息

Department of Biochemistry, University of Western Ontario, London, Ontario, Canada N6A 5C1.

出版信息

Clin Chim Acta. 2005 Sep;359(1-2):1-26. doi: 10.1016/j.cccn.2005.04.002.

Abstract

BACKGROUND

The concept of resampling data--more commonly referred to as bootstrapping--has been in use for more than three decades. Bootstrapping has considerable theoretical advantages when it is applied to non-Gaussian data. Most of the published literature is concerned with the mathematical aspects of the bootstrap but increasingly this technique is being utilized in medical and other fields.

METHODS

I reviewed the published literature following a 1994 publication assessing the transfer of technology, including the bootstrap, to the biomedical literature.

RESULTS

In the ten-year period following that 1994 paper there were 1679 published references to the technique in Medline. In that same time period the following citations were found in the four major medical journals-British Medical Journal (48), JAMA (51), Lancet (52) and the New England Journal of Medicine (45).

CONTENT

I introduce the basic theory of the bootstrap, the jackknife, and permutation tests. The bootstrap is used to estimate the accuracy of an estimator such as the standard error, a confidence interval, or the bias of an estimator. The technique may be useful for analysing smallish expensive-to-collect data sets where prior information is sparse, distributional assumptions are unclear, and where further data may be difficult to acquire. Some of the elementary uses of bootstrapping are illustrated by considering the calculation of confidence intervals such as for reference ranges or for experimental data findings, hypothesis testing such as comparing experimental findings, linear regression, and correlation when studying association and prediction of variables, non-linear regression such as used in immunoassay techniques, and ROC curve processing.

CONCLUSIONS

These techniques can supplement current nonparametric statistical methods and should be included, where appropriate, in the armamentarium of data processing methodologies.

摘要

背景

重采样数据的概念——更常见的说法是自抽样法——已使用超过三十年。当应用于非高斯数据时,自抽样法具有相当大的理论优势。大多数已发表的文献关注自抽样法的数学方面,但该技术在医学和其他领域的应用越来越多。

方法

我回顾了1994年一篇评估包括自抽样法在内的技术向生物医学文献转移情况的已发表文献。

结果

在1994年那篇论文发表后的十年里,Medline中有1679篇关于该技术的已发表参考文献。在同一时期,在四大医学期刊中发现了以下引用次数——《英国医学杂志》(48次)、《美国医学会杂志》(51次)、《柳叶刀》(52次)和《新英格兰医学杂志》(45次)。

内容

我介绍了自抽样法、交叉验证法和置换检验的基本理论。自抽样法用于估计估计量的准确性,如标准误差、置信区间或估计量的偏差。该技术可能有助于分析规模较小、收集成本高的数据集,这些数据集的先验信息稀少、分布假设不明确,且难以获取更多数据。通过考虑计算置信区间(如参考范围或实验数据结果的置信区间)、假设检验(如比较实验结果)、线性回归以及研究变量的关联和预测时的相关性、免疫分析技术中使用的非线性回归以及ROC曲线处理等,说明了自抽样法的一些基本用途。

结论

这些技术可以补充当前的非参数统计方法,并且在适当情况下应纳入数据处理方法的工具库中。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验