Gaddis M L, Gaddis G M
Department of Surgery, University of Missouri-Kansas City School of Medicine, Truman Medical Center 64108.
Ann Emerg Med. 1990 Dec;19(12):1462-8. doi: 10.1016/s0196-0644(05)82622-8.
Correlation and regression analysis are applied to data to define and quantify the relationship between two variables. Correlation analysis is used to estimate the strength of a relationship between two variables. The correlation coefficient r is a dimensionless number ranging from -1 to +1. A value of -1 signifies a perfect negative, or indirect (inverse) relationship. A value of +1 signifies a perfect positive, or direct relationship. The r can be calculated as the Pearson-product r, using normally distributed interval or ratio data, or as the Spearman rank r, using non-normally distributed data that are not interval or ratio in nature. Linear regression analysis results in the formation of an equation of a line (Y = mX + b), which mathematically describes the line of best fit for a data relationship between X and Y variables. This equation can then be used to predict additional dependent variable values (Y), based on the value or the independent variable X, the slope m, and the Y-intercept b. Interpretation of the correlation coefficient r involves use of r2, which implies the degree of variability of Y due to X. Tests of significance for linear regression are similar conceptually to significance testing using analysis of variance. Multiple correlation and regression, more complex analytical methods that define relationships between three or more variables, are not covered in this article. Closing comments for this final installment of this introduction to biostatistics series are presented.
相关性分析和回归分析应用于数据,以定义和量化两个变量之间的关系。相关性分析用于估计两个变量之间关系的强度。相关系数r是一个无量纲数,范围从-1到+1。值为-1表示完全负相关或间接(反向)关系。值为+1表示完全正相关或直接关系。r可以计算为Pearson积差相关系数r,使用正态分布的区间或比率数据,也可以计算为Spearman等级相关系数r,使用本质上不是区间或比率的非正态分布数据。线性回归分析会生成一条直线方程(Y = mX + b),该方程从数学上描述了X和Y变量之间数据关系的最佳拟合线。然后,基于自变量X的值、斜率m和Y轴截距b,该方程可用于预测额外的因变量值(Y)。相关系数r的解释涉及使用r2,它表示Y因X而产生的变异程度。线性回归的显著性检验在概念上类似于使用方差分析的显著性检验。本文不涉及多元相关和回归,这是定义三个或更多变量之间关系的更复杂分析方法。本文给出了生物统计学系列引言最后一部分的结束语。