Johns Hopkins University, Baltimore, Maryland, United States of America.
Child Mind Institute, New York, New York, United States of America.
PLoS Comput Biol. 2021 Sep 16;17(9):e1009279. doi: 10.1371/journal.pcbi.1009279. eCollection 2021 Sep.
Replicability, the ability to replicate scientific findings, is a prerequisite for scientific discovery and clinical utility. Troublingly, we are in the midst of a replicability crisis. A key to replicability is that multiple measurements of the same item (e.g., experimental sample or clinical participant) under fixed experimental constraints are relatively similar to one another. Thus, statistics that quantify the relative contributions of accidental deviations-such as measurement error-as compared to systematic deviations-such as individual differences-are critical. We demonstrate that existing replicability statistics, such as intra-class correlation coefficient and fingerprinting, fail to adequately differentiate between accidental and systematic deviations in very simple settings. We therefore propose a novel statistic, discriminability, which quantifies the degree to which an individual's samples are relatively similar to one another, without restricting the data to be univariate, Gaussian, or even Euclidean. Using this statistic, we introduce the possibility of optimizing experimental design via increasing discriminability and prove that optimizing discriminability improves performance bounds in subsequent inference tasks. In extensive simulated and real datasets (focusing on brain imaging and demonstrating on genomics), only optimizing data discriminability improves performance on all subsequent inference tasks for each dataset. We therefore suggest that designing experiments and analyses to optimize discriminability may be a crucial step in solving the replicability crisis, and more generally, mitigating accidental measurement error.
可重复性,即复制科学发现的能力,是科学发现和临床应用的前提。令人不安的是,我们正处于可重复性危机之中。可重复性的一个关键是,在固定的实验约束下,对同一项目(例如实验样本或临床参与者)的多次测量彼此之间相对相似。因此,量化偶然偏差(例如测量误差)与系统偏差(例如个体差异)相对贡献的统计数据至关重要。我们证明,现有的可重复性统计数据,如组内相关系数和指纹识别,在非常简单的情况下无法充分区分偶然偏差和系统偏差。因此,我们提出了一种新的统计量,可辨别性,它量化了个体样本彼此之间的相对相似程度,而不限制数据为单变量、正态或甚至欧几里得。使用该统计量,我们引入了通过增加可辨别性来优化实验设计的可能性,并证明优化可辨别性可以提高后续推理任务中的性能界限。在广泛的模拟和真实数据集(重点关注脑成像,并在基因组学方面进行了演示)中,只有优化数据的可辨别性才能提高每个数据集所有后续推理任务的性能。因此,我们建议设计实验和分析以优化可辨别性可能是解决可重复性危机的关键步骤,更广泛地说,也是减轻偶然测量误差的关键步骤。