Department of Clinical Immunology and Rheumatology, Sanjay Gandhi Postgraduate Institute of Medical Sciences (SGPGIMS), Lucknow, India.
Department of Internal Medicine No. 2, Danylo Halytsky Lviv National Medical University, Lviv, Ukraine.
Rheumatol Int. 2021 Jan;41(1):43-55. doi: 10.1007/s00296-020-04740-z. Epub 2020 Nov 17.
Statistical presentation of data is key to understanding patterns and drawing inferences about biomedical phenomena. In this article, we provide an overview of basic statistical considerations for data analysis. Assessment of whether tested parameters are distributed normally is important to decide whether to employ parametric or non-parametric data analyses. The nature of variables (continuous or discrete) also determines analysis strategies. Normally distributed data can be presented using means with standard deviations (SD), whereas non-parametric measures such as medians (with range or interquartile range) should be used for non-normal distributions. While the SD provides a measure of data dispersion, the standard error provides estimates of the 95% confidence interval i.e. the actual mean in the population. Univariable analyses should be directed to denote effect sizes, as well as test a priori hypothesis (i.e. null hypothesis significance testing). Univariable analyses should be followed up by suitable adjusted multivariable analyses such as linear or logistic regression. Linear correlation statistics can help assess whether two variables change hand in hand. Concordance rather than correlation should be used to compare outcome measures of disease states. Prior sample size calculation to ensure adequate study power is recommended for studies which have analogues in the literature with SDs. Statistical considerations for systematic reviews should include appropriate use of meta-analysis, assessment of heterogeneity, publication bias assessment when there are more than ten studies, and quality assessment of studies. Since statistical errors are responsible for a significant proportion of retractions, appropriate statistical analysis is mandatory during study planning and data analysis.
数据的统计呈现是理解生物医学现象模式和得出推论的关键。本文提供了数据分析中基本统计考虑因素的概述。评估测试参数是否呈正态分布对于决定采用参数或非参数数据分析非常重要。变量的性质(连续或离散)也决定了分析策略。正态分布的数据可以使用平均值和标准差(SD)表示,而非正态分布的数据则应使用中位数(范围或四分位距)等非参数度量。SD 提供了数据离散程度的度量,而标准误差则提供了 95%置信区间的估计值,即总体中的实际平均值。单变量分析应旨在表示效应大小,并检验先验假设(即零假设显著性检验)。单变量分析后应进行适当的调整多变量分析,如线性或逻辑回归。线性相关统计可以帮助评估两个变量是否协同变化。应使用一致性而不是相关性来比较疾病状态的结局测量。对于与文献中具有 SD 的研究具有类似性的研究,建议进行事先的样本量计算以确保有足够的研究能力。系统评价的统计考虑因素应包括适当使用荟萃分析、评估异质性、当有超过 10 项研究时评估发表偏倚,并对研究进行质量评估。由于统计错误占撤回的很大比例,因此在研究计划和数据分析过程中必须进行适当的统计分析。