几种删失数据分析法的比较。

Hewett Paul, Ganser Gary H

Exposure Assessment Solutions, Inc., Morgantown, West Virginia, USA.

Ann Occup Hyg. 2007 Oct;51(7):611-32. doi: 10.1093/annhyg/mem045.

The purpose of this study was to compare the performance of several methods for statistically analyzing censored datasets [i.e. datasets that contain measurements that are less than the field limit-of-detection (LOD)] when estimating the 95th percentile and the mean of right-skewed occupational exposure data. The methods examined were several variations on the maximum likelihood estimation (MLE) and log-probit regression (LPR) methods, the common substitution methods, several non-parametric (NP) quantile methods for the 95th percentile and the NP Kaplan-Meier (KM) method. Each method was challenged with computer-generated censored datasets for a variety of plausible scenarios where the following factors were allowed to vary randomly within fairly wide ranges: the true geometric standard deviation, the censoring point or LOD and the sample size. This was repeated for both a single-laboratory scenario (i.e. single LOD) and a multiple-laboratory scenario (i.e. three LODs) as well as a single lognormal distribution scenario and a contaminated lognormal distribution scenario. Each method was used to estimate the 95th percentile and mean for the censored datasets (the NP quantile methods estimated only the 95th percentile). For each scenario, the method bias and overall imprecision (as indicated by the root mean square error or rMSE) were calculated for the 95th percentile and mean. No single method was unequivocally superior across all scenarios, although nearly all of the methods excelled in one or more scenarios. Overall, only the MLE- and LPR-based methods performed well across all scenarios, with the robust versions generally showing less bias than the standard versions when challenged with a contaminated lognormal distribution and multiple LODs. All of the MLE- and LPR-based methods were remarkably robust to departures from the lognormal assumption, nearly always having lower rMSE values than the NP methods for the exposure scenarios postulated. In general, the MLE methods tended to have smaller rMSE values than the LPR methods, particularly for the small sample size scenarios. The substitution methods tended to be strongly biased, but in some scenarios had the smaller rMSE values, especially for sample sizes <20. Surprisingly, the various NP methods were not as robust as expected, performing poorly in the contaminated distribution scenarios for both the 95th percentile and the mean. In conclusion, when using the rMSE rather than bias as the preferred comparison metric, the standard MLE method consistently outperformed the so-called robust variations of the MLE-based and LPR-based methods, as well as the various NP methods, for both the 95th percentile and the mean. When estimating the mean, the standard LPR method tended to outperform the robust LPR-based methods. Whenever bias is the main consideration, the robust MLE-based methods should be considered. The KM method, currently hailed by some as the preferred method for estimating the mean when the lognormal distribution assumption is questioned, did not perform well for either the 95th percentile or mean and is not recommended.

本研究的目的是比较几种统计分析删失数据集（即包含小于检测限（LOD）的测量值的数据集）的方法在估计第95百分位数和右偏态职业暴露数据均值时的性能。所考察的方法包括最大似然估计（MLE）和对数概率回归（LPR）方法的几种变体、常用替代方法、几种用于第95百分位数的非参数（NP）分位数方法以及NP Kaplan-Meier（KM）方法。每种方法都通过计算机生成的删失数据集在各种合理场景下进行测试，在这些场景中，以下因素被允许在相当宽的范围内随机变化：真实几何标准差、删失点或LOD以及样本量。针对单实验室场景（即单个LOD）和多实验室场景（即三个LOD）以及单对数正态分布场景和污染对数正态分布场景都重复了这一过程。每种方法都用于估计删失数据集的第95百分位数和均值（NP分位数方法仅估计第95百分位数）。对于每个场景，计算第95百分位数和均值的方法偏差和总体不精密度（以均方根误差或rMSE表示）。虽然几乎所有方法在一个或多个场景中表现出色，但没有一种方法在所有场景中都绝对优于其他方法。总体而言，只有基于MLE和LPR的方法在所有场景中都表现良好，在面对污染对数正态分布和多个LOD时，稳健版本通常比标准版本偏差更小。所有基于MLE和LPR的方法对于偏离对数正态假设都具有显著的稳健性，在假设的暴露场景中，其rMSE值几乎总是低于NP方法。一般来说，MLE方法的rMSE值往往比LPR方法小，特别是在小样本量场景中。替代方法往往存在强烈偏差，但在某些场景中rMSE值较小，尤其是对于样本量<20的情况。令人惊讶的是，各种NP方法并不像预期的那样稳健，在污染分布场景中对于第95百分位数和均值的表现都很差。总之，当使用rMSE而非偏差作为首选比较指标时，标准MLE方法在第95百分位数和均值方面始终优于基于MLE和LPR的方法的所谓稳健变体以及各种NP方法。在估计均值时，标准LPR方法往往优于基于LPR的稳健方法。每当偏差是主要考虑因素时，应考虑基于MLE的稳健方法。目前被一些人誉为在对数正态分布假设受到质疑时估计均值的首选方法的KM方法，在第95百分位数或均值方面表现都不佳，不建议使用。

相似文献

A comparison of several methods for analyzing censored data.

Ann Occup Hyg. 2007 Oct;51(7):611-32. doi: 10.1093/annhyg/mem045.

An accurate substitution method for analyzing censored data.

J Occup Environ Hyg. 2010 Apr;7(4):233-44. doi: 10.1080/15459621003609713.

Comparison of methods for analyzing left-censored occupational exposure data.

Ann Occup Hyg. 2014 Nov;58(9):1126-42. doi: 10.1093/annhyg/meu067. Epub 2014 Sep 26.

Estimating mean exposures from censored data: exposure to benzene in the Australian petroleum industry.

Ann Occup Hyg. 2001 Jun;45(4):275-82.

A Comparison of the β-Substitution Method and a Bayesian Method for Analyzing Left-Censored Data.

Ann Occup Hyg. 2016 Jan;60(1):56-73. doi: 10.1093/annhyg/mev049. Epub 2015 Jul 24.

Quantification of variability and uncertainty for censored data sets and application to air toxic emission factors.

Risk Anal. 2004 Aug;24(4):1019-34. doi: 10.1111/j.0272-4332.2004.00504.x.

Maximum likelihood estimates of mean and variance of occupation radiation doses subjected to minimum detection levels.

Radiat Prot Dosimetry. 2008;129(4):411-8. doi: 10.1093/rpd/ncm483. Epub 2007 Dec 14.

Evaluation of maximum likelihood procedures to estimate left censored observations.

Anal Chem. 2008 Feb 15;80(4):1124-32. doi: 10.1021/ac0711788. Epub 2008 Jan 16.

A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment.

J Ment Health Policy Econ. 2002 Mar;5(1):21-31.

Analysis of censored exposure data by constrained maximization of the Shapiro-Wilk W statistic.

Ann Occup Hyg. 2010 Apr;54(3):263-71. doi: 10.1093/annhyg/mep083. Epub 2009 Dec 2.

引用本文的文献

Respirable dust and respirable crystalline silica exposures among workers at stone countertop fabrication shops in Georgia from 2017 through 2023.

Ann Work Expo Health. 2025 Apr 18. doi: 10.1093/annweh/wxaf014.

Blood Vitamin Concentrations in Pond Sliders () Under Human Care in Central Europe and Possible Seasonal and Sex-Specific Influences.

Animals (Basel). 2025 Mar 17;15(6):859. doi: 10.3390/ani15060859.

Comparison of 46 Cytokines in Peripheral Blood Between Patients with Papillary Thyroid Cancer and Healthy Individuals with AI-Driven Analysis to Distinguish Between the Two Groups.

Diagnostics (Basel). 2025 Mar 20;15(6):791. doi: 10.3390/diagnostics15060791.

Forest terpenes and stress: Examining the associations of filtered vs. non-filtered air in a real-life natural environment.

Environ Res. 2025 Jul 1;276:121482. doi: 10.1016/j.envres.2025.121482. Epub 2025 Mar 25.

Circulating sex hormones and volumetric breast density: A prospective study in women from the EPIC Florence cohort.

Int J Cancer. 2025 Jun 15;156(12):2294-2302. doi: 10.1002/ijc.35321. Epub 2024 Dec 29.

Spatial and Temporal Mapping of RF Exposure in an Urban Core Using Exposimeter and GIS.

Sensors (Basel). 2025 Feb 20;25(5):1301. doi: 10.3390/s25051301.

Exploring bias due to below-limit-of-detection values in influenza vaccine antibody modeling: A case study and instructional guide for the CIVIC study.

Vaccine. 2025 Mar 7;49:126802. doi: 10.1016/j.vaccine.2025.126802. Epub 2025 Feb 4.

Environmental and dietary factors associated with urinary OH-PAHs in mid-pregnancy in a large multi-site study.

Environ Res. 2025 Feb 1;266:120516. doi: 10.1016/j.envres.2024.120516. Epub 2024 Dec 2.

Phosphorus removal from irrigation return flow using an iron oxide filter and denitrifying pine bark bioreactor treatment train.

Environ Sci Pollut Res Int. 2024 Dec;31(58):66435-66444. doi: 10.1007/s11356-024-35641-4. Epub 2024 Dec 4.

Temporal distribution and ecological risk assessment for pesticides in water from the north-central coastal zone of Sinaloa, Mexico.

Heliyon. 2024 Jul 25;10(15):e35207. doi: 10.1016/j.heliyon.2024.e35207. eCollection 2024 Aug 15.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

A comparison of several methods for analyzing censored data.

Ann Occup Hyg. 2007 Oct;51(7):611-32. doi: 10.1093/annhyg/mem045.

An accurate substitution method for analyzing censored data.

J Occup Environ Hyg. 2010 Apr;7(4):233-44. doi: 10.1080/15459621003609713.

Comparison of methods for analyzing left-censored occupational exposure data.

Ann Occup Hyg. 2014 Nov;58(9):1126-42. doi: 10.1093/annhyg/meu067. Epub 2014 Sep 26.

Estimating mean exposures from censored data: exposure to benzene in the Australian petroleum industry.

Ann Occup Hyg. 2001 Jun;45(4):275-82.

A Comparison of the β-Substitution Method and a Bayesian Method for Analyzing Left-Censored Data.

Ann Occup Hyg. 2016 Jan;60(1):56-73. doi: 10.1093/annhyg/mev049. Epub 2015 Jul 24.

Quantification of variability and uncertainty for censored data sets and application to air toxic emission factors.

Risk Anal. 2004 Aug;24(4):1019-34. doi: 10.1111/j.0272-4332.2004.00504.x.

Maximum likelihood estimates of mean and variance of occupation radiation doses subjected to minimum detection levels.

Radiat Prot Dosimetry. 2008;129(4):411-8. doi: 10.1093/rpd/ncm483. Epub 2007 Dec 14.

Evaluation of maximum likelihood procedures to estimate left censored observations.

Anal Chem. 2008 Feb 15;80(4):1124-32. doi: 10.1021/ac0711788. Epub 2008 Jan 16.

A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment.

J Ment Health Policy Econ. 2002 Mar;5(1):21-31.

Analysis of censored exposure data by constrained maximization of the Shapiro-Wilk W statistic.

Ann Occup Hyg. 2010 Apr;54(3):263-71. doi: 10.1093/annhyg/mep083. Epub 2009 Dec 2.

引用本文的文献

Respirable dust and respirable crystalline silica exposures among workers at stone countertop fabrication shops in Georgia from 2017 through 2023.

Ann Work Expo Health. 2025 Apr 18. doi: 10.1093/annweh/wxaf014.

Blood Vitamin Concentrations in Pond Sliders () Under Human Care in Central Europe and Possible Seasonal and Sex-Specific Influences.

Animals (Basel). 2025 Mar 17;15(6):859. doi: 10.3390/ani15060859.

Comparison of 46 Cytokines in Peripheral Blood Between Patients with Papillary Thyroid Cancer and Healthy Individuals with AI-Driven Analysis to Distinguish Between the Two Groups.

Diagnostics (Basel). 2025 Mar 20;15(6):791. doi: 10.3390/diagnostics15060791.

Forest terpenes and stress: Examining the associations of filtered vs. non-filtered air in a real-life natural environment.

Environ Res. 2025 Jul 1;276:121482. doi: 10.1016/j.envres.2025.121482. Epub 2025 Mar 25.

Circulating sex hormones and volumetric breast density: A prospective study in women from the EPIC Florence cohort.

Int J Cancer. 2025 Jun 15;156(12):2294-2302. doi: 10.1002/ijc.35321. Epub 2024 Dec 29.

Spatial and Temporal Mapping of RF Exposure in an Urban Core Using Exposimeter and GIS.

Sensors (Basel). 2025 Feb 20;25(5):1301. doi: 10.3390/s25051301.

Exploring bias due to below-limit-of-detection values in influenza vaccine antibody modeling: A case study and instructional guide for the CIVIC study.

Vaccine. 2025 Mar 7;49:126802. doi: 10.1016/j.vaccine.2025.126802. Epub 2025 Feb 4.

Environmental and dietary factors associated with urinary OH-PAHs in mid-pregnancy in a large multi-site study.

Environ Res. 2025 Feb 1;266:120516. doi: 10.1016/j.envres.2024.120516. Epub 2024 Dec 2.

Phosphorus removal from irrigation return flow using an iron oxide filter and denitrifying pine bark bioreactor treatment train.

Environ Sci Pollut Res Int. 2024 Dec;31(58):66435-66444. doi: 10.1007/s11356-024-35641-4. Epub 2024 Dec 4.

Temporal distribution and ecological risk assessment for pesticides in water from the north-central coastal zone of Sinaloa, Mexico.

Heliyon. 2024 Jul 25;10(15):e35207. doi: 10.1016/j.heliyon.2024.e35207. eCollection 2024 Aug 15.

A comparison of several methods for analyzing censored data.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献