From the Department of Surgery and Perioperative Care, Dell Medical School at the University of Texas at Austin, Austin, Texas.
Anesth Analg. 2017 Oct;125(4):1375-1380. doi: 10.1213/ANE.0000000000002370.
Designing, conducting, analyzing, reporting, and interpreting the findings of a research study require an understanding of the types and characteristics of data and variables. Descriptive statistics are typically used simply to calculate, describe, and summarize the collected research data in a logical, meaningful, and efficient way. Inferential statistics allow researchers to make a valid estimate of the association between an intervention and the treatment effect in a specific population, based upon their randomly collected, representative sample data. Categorical data can be either dichotomous or polytomous. Dichotomous data have only 2 categories, and thus are considered binary. Polytomous data have more than 2 categories. Unlike dichotomous and polytomous data, ordinal data are rank ordered, typically based on a numerical scale that is comprised of a small set of discrete classes or integers. Continuous data are measured on a continuum and can have any numeric value over this continuous range. Continuous data can be meaningfully divided into smaller and smaller or finer and finer increments, depending upon the precision of the measurement instrument. Interval data are a form of continuous data in which equal intervals represent equal differences in the property being measured. Ratio data are another form of continuous data, which have the same properties as interval data, plus a true definition of an absolute zero point, and the ratios of the values on the measurement scale make sense. The normal (Gaussian) distribution ("bell-shaped curve") is of the most common statistical distributions. Many applied inferential statistical tests are predicated on the assumption that the analyzed data follow a normal distribution. The histogram and the Q-Q plot are 2 graphical methods to assess if a set of data have a normal distribution (display "normality"). The Shapiro-Wilk test and the Kolmogorov-Smirnov test are 2 well-known and historically widely applied quantitative methods to assess for data normality. Parametric statistical tests make certain assumptions about the characteristics and/or parameters of the underlying population distribution upon which the test is based, whereas nonparametric tests make fewer or less rigorous assumptions. If the normality test concludes that the study data deviate significantly from a Gaussian distribution, rather than applying a less robust nonparametric test, the problem can potentially be remedied by judiciously and openly: (1) performing a data transformation of all the data values; or (2) eliminating any obvious data outlier(s).
设计、进行、分析、报告和解释研究结果需要理解数据和变量的类型和特征。描述性统计通常仅用于以逻辑、有意义和有效的方式计算、描述和总结收集到的研究数据。推断统计允许研究人员根据他们随机收集的代表性样本数据,对特定人群中干预措施与治疗效果之间的关联做出有效估计。分类数据可以是二分类的,也可以是多分类的。二分类数据只有 2 个类别,因此被认为是二进制的。多分类数据有超过 2 个类别。与二分类和多分类数据不同,有序数据是有序的,通常基于一个由一小部分离散类别或整数组成的数字量表。连续数据是在连续体上测量的,可以在这个连续范围内具有任何数值。连续数据可以根据测量仪器的精度进一步细分为更小或更细的增量。区间数据是连续数据的一种形式,其中相等的区间代表被测量的属性的相等差异。比率数据是连续数据的另一种形式,它具有与区间数据相同的属性,加上绝对零值的真实定义,并且测量尺度上的值的比值有意义。正态(高斯)分布(“钟形曲线”)是最常见的统计分布之一。许多应用的推断统计检验都基于分析数据遵循正态分布的假设。直方图和 Q-Q 图是两种图形方法,用于评估一组数据是否具有正态分布(显示“正态性”)。Shapiro-Wilk 检验和 Kolmogorov-Smirnov 检验是两种著名的、历史上广泛应用的定量方法,用于评估数据的正态性。参数统计检验基于测试所基于的基础总体分布的特征和/或参数做出某些假设,而非参数检验则做出较少或较不严格的假设。如果正态性检验得出研究数据明显偏离高斯分布的结论,而不是应用不太稳健的非参数检验,则可以通过明智且公开地解决以下问题来解决该问题:(1)对所有数据值进行数据转换;或(2)消除任何明显的异常值。