Rascati K L, Smith M J, Neilands T
University of Texas College of Pharmacy, Austin 78712, USA.
Clin Ther. 2001 Mar;23(3):481-98. doi: 10.1016/s0149-2918(01)80052-7.
Cost data often are nonnormally distributed due to a few very high cost values that may not necessarily be dismissed as outliers. Researchers have not reached agreement on how to appropriately deal with skewed cost data.
This study presents an example of skewed cost data that were collected retrospectively from the Texas Medicaid database. Common methods of dealing with skewed cost distributions are discussed. Data were analyzed using various methods, and the statistical results of each test were compared.
Prescription and medical claims data extracted from the Texas Medicaid database were analyzed using the Mann-Whitney U test and t tests of untransformed, log-transformed, and bootstrapped data.
All distributions of the untransformed cost data were nonnormally distributed, and comparison groups had unequal variances. The Mann-Whitney U test negated the effect of the high-cost patients and gave a significant result for overall cost differences between groups, but in the opposite direction of the mean. The t tests on raw data and log-transformed data may not have been optimal because distributions of both raw costs and log-costs were nonnormal.
The bootstrap method does not need to meet the assumptions of normality and equal variances. In analyses of small sample sizes with skewed cost data, the bootstrap method may offer an alternative to the more traditional nonparametric or log-transformation techniques.
由于存在一些可能不一定被视为异常值的极高成本值,成本数据通常呈非正态分布。研究人员对于如何恰当地处理偏态成本数据尚未达成共识。
本研究展示了一个从德克萨斯医疗补助数据库中回顾性收集的偏态成本数据的实例。讨论了处理偏态成本分布的常见方法。使用各种方法对数据进行分析,并比较每个测试的统计结果。
使用曼-惠特尼U检验以及对未转换数据、对数转换数据和自抽样数据进行t检验,对从德克萨斯医疗补助数据库中提取的处方和医疗索赔数据进行分析。
未转换成本数据的所有分布均呈非正态分布,且比较组的方差不相等。曼-惠特尼U检验消除了高成本患者的影响,并得出了组间总体成本差异的显著结果,但方向与均值相反。对原始数据和对数转换数据进行的t检验可能并非最优,因为原始成本和对数成本的分布均为非正态。
自抽样方法无需满足正态性和等方差假设。在对具有偏态成本数据的小样本进行分析时,自抽样方法可能为更传统的非参数或对数转换技术提供一种替代方法。