Department of Forensic and Neurodevelopmental Sciences, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.
Sackler Institute for Translational Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom.
PLoS Comput Biol. 2021 Nov 18;17(11):e1009477. doi: 10.1371/journal.pcbi.1009477. eCollection 2021 Nov.
Over the past decade, biomarker discovery has become a key goal in psychiatry to aid in the more reliable diagnosis and prognosis of heterogeneous psychiatric conditions and the development of tailored therapies. Nevertheless, the prevailing statistical approach is still the mean group comparison between "cases" and "controls," which tends to ignore within-group variability. In this educational article, we used empirical data simulations to investigate how effect size, sample size, and the shape of distributions impact the interpretation of mean group differences for biomarker discovery. We then applied these statistical criteria to evaluate biomarker discovery in one area of psychiatric research-autism research. Across the most influential areas of autism research, effect size estimates ranged from small (d = 0.21, anatomical structure) to medium (d = 0.36 electrophysiology, d = 0.5, eye-tracking) to large (d = 1.1 theory of mind). We show that in normal distributions, this translates to approximately 45% to 63% of cases performing within 1 standard deviation (SD) of the typical range, i.e., they do not have a deficit/atypicality in a statistical sense. For a measure to have diagnostic utility as defined by 80% sensitivity and 80% specificity, Cohen's d of 1.66 is required, with still 40% of cases falling within 1 SD. However, in both normal and nonnormal distributions, 1 (skewness) or 2 (platykurtic, bimodal) biologically plausible subgroups may exist despite small or even nonsignificant mean group differences. This conclusion drastically contrasts the way mean group differences are frequently reported. Over 95% of studies omitted the "on average" when summarising their findings in their abstracts ("autistic people have deficits in X"), which can be misleading as it implies that the group-level difference applies to all individuals in that group. We outline practical approaches and steps for researchers to explore mean group comparisons for the discovery of stratification biomarkers.
在过去的十年中,生物标志物的发现已成为精神病学的一个主要目标,以帮助更可靠地诊断和预测异质精神疾病,并开发针对性的治疗方法。然而,目前占主导地位的统计方法仍然是“病例”和“对照”之间的均值组比较,这往往忽略了组内的变异性。在这篇教育文章中,我们使用经验数据模拟来研究效应大小、样本量和分布形状如何影响生物标志物发现中对均值组差异的解释。然后,我们应用这些统计标准来评估精神病学研究领域之一——自闭症研究中的生物标志物发现。在自闭症研究最有影响力的领域中,效应大小估计值从小(d = 0.21,解剖结构)到中(d = 0.36 电生理学,d = 0.5,眼动追踪)到大(d = 1.1 心理理论)不等。我们表明,在正态分布中,这相当于大约 45%到 63%的病例在典型范围内的 1 个标准差内(即,从统计学意义上讲,他们没有缺陷/异常)。对于一项具有 80%灵敏度和 80%特异性的诊断效用的测量,需要 Cohen 的 d 值为 1.66,仍有 40%的病例落在 1 个标准差内。然而,在正态和非正态分布中,尽管均值组差异很小甚至没有统计学意义,但可能存在 1 个(偏度)或 2 个(扁平,双峰)生物学上合理的亚组。这一结论与人们通常报告均值组差异的方式形成鲜明对比。超过 95%的研究在总结其研究结果的摘要中省略了“平均”(“自闭症患者在 X 方面存在缺陷”),这可能会产生误导,因为它意味着组间差异适用于该组中的所有个体。我们概述了研究人员探索分层生物标志物发现的均值组比较的实用方法和步骤。