Department of Statistics and Operations Research, Tel-Aviv University, Tel-Aviv, Israel.
Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York, United States of America.
PLoS Biol. 2023 May 1;21(5):e3002082. doi: 10.1371/journal.pbio.3002082. eCollection 2023 May.
The utility of mouse and rat studies critically depends on their replicability in other laboratories. A widely advocated approach to improving replicability is through the rigorous control of predefined animal or experimental conditions, known as standardization. However, this approach limits the generalizability of the findings to only to the standardized conditions and is a potential cause rather than solution to what has been called a replicability crisis. Alternative strategies include estimating the heterogeneity of effects across laboratories, either through designs that vary testing conditions, or by direct statistical analysis of laboratory variation. We previously evaluated our statistical approach for estimating the interlaboratory replicability of a single laboratory discovery. Those results, however, were from a well-coordinated, multi-lab phenotyping study and did not extend to the more realistic setting in which laboratories are operating independently of each other. Here, we sought to test our statistical approach as a realistic prospective experiment, in mice, using 152 results from 5 independent published studies deposited in the Mouse Phenome Database (MPD). In independent replication experiments at 3 laboratories, we found that 53 of the results were replicable, so the other 99 were considered non-replicable. Of the 99 non-replicable results, 59 were statistically significant (at 0.05) in their original single-lab analysis, putting the probability that a single-lab statistical discovery was made even though it is non-replicable, at 59.6%. We then introduced the dimensionless "Genotype-by-Laboratory" (GxL) factor-the ratio between the standard deviations of the GxL interaction and the standard deviation within groups. Using the GxL factor reduced the number of single-lab statistical discoveries and alongside reduced the probability of a non-replicable result to be discovered in the single lab to 12.1%. Such reduction naturally leads to reduced power to make replicable discoveries, but this reduction was small (from 87% to 66%), indicating the small price paid for the large improvement in replicability. Tools and data needed for the above GxL adjustment are publicly available at the MPD and will become increasingly useful as the range of assays and testing conditions in this resource increases.
鼠类研究的实用性很大程度上取决于其在其他实验室中的可重复性。提高可重复性的一种广泛提倡的方法是通过严格控制预先设定的动物或实验条件,即标准化。然而,这种方法将研究结果的普遍性限制在标准化条件内,这可能是导致所谓的可重复性危机的原因之一。替代策略包括估计实验室间效应的异质性,要么通过改变测试条件的设计,要么通过直接对实验室变异进行统计分析。我们之前评估了我们用于估计单个实验室发现的实验室间可重复性的统计方法。然而,这些结果来自于一项协调良好的多实验室表型研究,并没有扩展到实验室彼此独立运作的更现实的环境中。在这里,我们试图在老鼠身上进行我们的统计方法的测试,这是一个现实的前瞻性实验,使用来自 Mouse Phenome Database (MPD) 的 5 个独立已发表研究的 152 个结果。在 3 个实验室进行的独立复制实验中,我们发现 53 个结果是可复制的,因此其余 99 个结果被认为是不可复制的。在原始的单实验室分析中,99 个不可复制的结果中有 59 个具有统计学意义(在 0.05 水平上),这意味着即使是不可复制的单实验室统计发现,其发生的概率也为 59.6%。然后,我们引入了无维的“基因型-实验室”(GxL)因子——GxL 相互作用的标准差与组内标准差的比值。使用 GxL 因子减少了单实验室统计发现的数量,并将单实验室中不可复制结果被发现的概率降低到 12.1%。这种减少自然会降低做出可复制发现的能力,但这种减少很小(从 87%降至 66%),表明在可重复性方面的大幅提高所付出的代价很小。上述 GxL 调整所需的工具和数据可在 MPD 上公开获取,并将随着该资源中检测和测试条件范围的增加而变得越来越有用。