Suppr超能文献

消除偶然偏差以最小化泛化误差和最大化可复制性:在连接组学和基因组学中的应用。

Eliminating accidental deviations to minimize generalization error and maximize replicability: Applications in connectomics and genomics.

机构信息

Johns Hopkins University, Baltimore, Maryland, United States of America.

Child Mind Institute, New York, New York, United States of America.

出版信息

PLoS Comput Biol. 2021 Sep 16;17(9):e1009279. doi: 10.1371/journal.pcbi.1009279. eCollection 2021 Sep.

Abstract

Replicability, the ability to replicate scientific findings, is a prerequisite for scientific discovery and clinical utility. Troublingly, we are in the midst of a replicability crisis. A key to replicability is that multiple measurements of the same item (e.g., experimental sample or clinical participant) under fixed experimental constraints are relatively similar to one another. Thus, statistics that quantify the relative contributions of accidental deviations-such as measurement error-as compared to systematic deviations-such as individual differences-are critical. We demonstrate that existing replicability statistics, such as intra-class correlation coefficient and fingerprinting, fail to adequately differentiate between accidental and systematic deviations in very simple settings. We therefore propose a novel statistic, discriminability, which quantifies the degree to which an individual's samples are relatively similar to one another, without restricting the data to be univariate, Gaussian, or even Euclidean. Using this statistic, we introduce the possibility of optimizing experimental design via increasing discriminability and prove that optimizing discriminability improves performance bounds in subsequent inference tasks. In extensive simulated and real datasets (focusing on brain imaging and demonstrating on genomics), only optimizing data discriminability improves performance on all subsequent inference tasks for each dataset. We therefore suggest that designing experiments and analyses to optimize discriminability may be a crucial step in solving the replicability crisis, and more generally, mitigating accidental measurement error.

摘要

可重复性,即复制科学发现的能力,是科学发现和临床应用的前提。令人不安的是,我们正处于可重复性危机之中。可重复性的一个关键是,在固定的实验约束下,对同一项目(例如实验样本或临床参与者)的多次测量彼此之间相对相似。因此,量化偶然偏差(例如测量误差)与系统偏差(例如个体差异)相对贡献的统计数据至关重要。我们证明,现有的可重复性统计数据,如组内相关系数和指纹识别,在非常简单的情况下无法充分区分偶然偏差和系统偏差。因此,我们提出了一种新的统计量,可辨别性,它量化了个体样本彼此之间的相对相似程度,而不限制数据为单变量、正态或甚至欧几里得。使用该统计量,我们引入了通过增加可辨别性来优化实验设计的可能性,并证明优化可辨别性可以提高后续推理任务中的性能界限。在广泛的模拟和真实数据集(重点关注脑成像,并在基因组学方面进行了演示)中,只有优化数据的可辨别性才能提高每个数据集所有后续推理任务的性能。因此,我们建议设计实验和分析以优化可辨别性可能是解决可重复性危机的关键步骤,更广泛地说,也是减轻偶然测量误差的关键步骤。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4978/8500408/cb1869691708/pcbi.1009279.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验