Farrell Mary Beth
Intersocietal Accreditation Commission
J Nucl Med Technol. 2018 Jun;46(2):76-80. doi: 10.2967/jnmt.117.204719. Epub 2018 Feb 2.
This article is the second part of a continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance, and variance are discussed. Throughout the article, actual statistics are pulled from the published literature. We begin by explaining 2 methods for quantifying interpretive accuracy: interreader and intrareader reliability. Agreement among readers can be expressed simply as a percentage. However, the Cohen κ-statistic is a more robust measure of agreement that accounts for chance. The higher the κ-statistic is, the higher is the agreement between readers. When 3 or more readers are being compared, the Fleiss κ-statistic is used. Significance testing determines whether the difference between 2 conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a probability () value. Calculation of value is beyond the scope of this review. However, knowing how to interpret values is important for understanding the scientific literature. Generally, a value of less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation (SD), confidence interval, and standard error (SE) explain the dispersion of data around a mean of a sample drawn from a population. SD is commonly reported in the literature. A small SD indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within 1 SD, 95% will fall within 2 SDs, and 99.7% will fall within 3 SDs. Confidence interval defines the range of possible values within which the population parameter is likely to lie and gives an idea of the precision of the statistic being measured. A wide confidence interval indicates that if the experiment were repeated multiple times on other samples, the measured statistic would lie within a wide range of possibilities. The confidence interval relies on the SE.
本文是继续教育系列文章的第二部分,回顾了核医学与分子影像技术人员应了解的基础统计学知识。在本文中,我们将讨论用于评估解读准确性、显著性和方差的统计学方法。在整篇文章中,实际的统计数据均取自已发表的文献。我们首先解释两种量化解读准确性的方法:读者间可靠性和读者内可靠性。读者之间的一致性可以简单地用百分比表示。然而,科恩κ统计量是一种更稳健的一致性度量方法,它考虑了随机因素。κ统计量越高,读者之间的一致性就越高。当比较三个或更多读者时,使用弗莱issκ统计量。显著性检验用于确定两种情况或干预措施之间的差异是否有意义。统计显著性通常用一个称为概率(p)值的数字来表示。p值的计算超出了本综述的范围。然而,了解如何解读p值对于理解科学文献很重要。一般来说,p值小于0.05被认为具有显著性,表明实验结果不仅仅是由于随机因素造成的。方差、标准差(SD)、置信区间和标准误差(SE)解释了从总体中抽取的样本均值周围的数据离散程度。标准差在文献中经常被报道。较小的标准差表明样本数据的变化不大。许多生物学测量值呈所谓的正态分布,形状为钟形曲线。在正态分布中,68%的数据将落在1个标准差范围内,95%的数据将落在2个标准差范围内,99.7%的数据将落在3个标准差范围内。置信区间定义了总体参数可能所在的可能值范围,并给出了所测量统计量的精度概念。较宽的置信区间表明,如果对其他样本多次重复该实验,所测量的统计量将落在很宽的可能性范围内。置信区间依赖于标准误差。