McGovern Kyle C, Silverman Justin D
Program in Bioinformatics and Genomics, Pennsylvania State University, University Park, PA, USA.
College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, USA.
BMC Bioinformatics. 2025 Jul 1;26(1):164. doi: 10.1186/s12859-025-06177-2.
BACKGROUND: Methods for differential expression and differential abundance analysis often rely on normalization to address sample-to-sample variation in sequencing depth. However, normalizations imply strict, unrealistic assumptions about the unmeasured scale of biological systems (e.g., microbial load or total cellular transcription). Even slight errors in these assumptions introduce bias, leading to elevated false positive and negative rates. RESULTS: We introduce interval assumptions as a generalization of normalizations. Unlike normalizations, our interval methods allow researchers to account for potential errors in assumptions about the system scale. Interval assumptions are also customizable and allow researchers to express more biologically plausible assumptions about scale. Interval assumptions even generalize Quantitative Microbiome Profiling (QMP), allowing researchers to account for errors in flow cytometry-based measurements of total cellular concentration. We develop a novel hypothesis testing framework that allows us to integrate interval assumptions into existing tools. We develop a modified version of the popular ALDEx2 method using interval assumptions rather than normalizations. Through real and simulated data analyses, we find that interval assumptions can dramatically decrease false positive rates (i.e., from 45% to 5%) while retaining or increasing statistical power. We also study interval assumptions under misspecification and show they still improve on normalizations. CONCLUSIONS: Interval assumptions enhance the rigor and reproducibility of differential expression and differential abundance analyses. Our results add to a growing body of literature arguing that normalizations should be replaced with alternative methods that allow researchers to account for scale uncertainty. However, compared to recent alternatives like scale models and sensitivity analyses, interval assumptions are easier to use, are more robust to misspecification, and have stronger and more interpretable inferential guarantees.
背景:差异表达和差异丰度分析方法通常依赖标准化来解决测序深度的样本间差异。然而,标准化对生物系统的未测量尺度(例如微生物负荷或总细胞转录)隐含着严格且不切实际的假设。即使这些假设中存在微小误差也会引入偏差,导致假阳性和假阴性率升高。 结果:我们引入区间假设作为标准化的一种推广。与标准化不同,我们的区间方法允许研究人员考虑关于系统尺度假设中的潜在误差。区间假设也是可定制的,并且允许研究人员表达关于尺度的更符合生物学实际的假设。区间假设甚至推广了定量微生物组分析(QMP),使研究人员能够考虑基于流式细胞术的总细胞浓度测量中的误差。我们开发了一种新颖的假设检验框架,使我们能够将区间假设整合到现有工具中。我们使用区间假设而非标准化开发了流行的ALDEx2方法的修改版本。通过实际和模拟数据分析,我们发现区间假设可以显著降低假阳性率(即从45%降至5%),同时保持或提高统计功效。我们还研究了错误设定下的区间假设,并表明它们仍优于标准化。 结论:区间假设增强了差异表达和差异丰度分析的严谨性和可重复性。我们的结果补充了越来越多的文献观点,即标准化应被允许研究人员考虑尺度不确定性的替代方法所取代。然而,与最近的替代方法如尺度模型和敏感性分析相比,区间假设更易于使用,对错误设定更具稳健性,并且具有更强且更具可解释性的推断保证。
Cochrane Database Syst Rev. 2008-7-16
Autism Adulthood. 2025-5-28
Cochrane Database Syst Rev. 2022-3-2
Autism Adulthood. 2025-5-28
Cochrane Database Syst Rev. 2025-2-19
Cochrane Database Syst Rev. 2022-1-17
PLoS Comput Biol. 2023-11
PLoS Comput Biol. 2022-7
PLoS Comput Biol. 2021-9