Suppr超能文献

分段对应曲线回归定量分析协变量对高通量实验可重复性的影响。

Segmented correspondence curve regression for quantifying covariate effects on the reproducibility of high-throughput experiments.

机构信息

School of Economics and Finance, Xi'an Jiaotong University, Xi'an, China.

Department of Statistics, Pennsylvania State University, Pennsylvania, USA.

出版信息

Biometrics. 2023 Sep;79(3):2272-2285. doi: 10.1111/biom.13757. Epub 2022 Sep 19.

Abstract

High-throughput biological experiments are essential tools for identifying biologically interesting candidates in large-scale omics studies. The results of a high-throughput biological experiment rely heavily on the operational factors chosen in its experimental and data-analytic procedures. Understanding how these operational factors influence the reproducibility of the experimental outcome is critical for selecting the optimal parameter settings and designing reliable high-throughput workflows. However, the influence of an operational factor may differ between strong and weak candidates in a high-throughput experiment, complicating the selection of parameter settings. To address this issue, we propose a novel segmented regression model, called segmented correspondence curve regression, to assess the influence of operational factors on the reproducibility of high-throughput experiments. Our model dissects the heterogeneous effects of operational factors on strong and weak candidates, providing a principled way to select operational parameters. Based on this framework, we also develop a sup-likelihood ratio test for the existence of heterogeneity. Simulation studies show that our estimation and testing procedures yield well-calibrated type I errors and are substantially more powerful in detecting and locating the differences in reproducibility across workflows than the existing method. Using this model, we investigated an important design question for ChIP-seq experiments: How many reads should one sequence to obtain reliable results in a cost-effective way? Our results reveal new insights into the impact of sequencing depth on the binding-site identification reproducibility, helping biologists determine the most cost-effective sequencing depth to achieve sufficient reproducibility for their study goals.

摘要

高通量生物实验是在大规模组学研究中识别有生物学意义的候选物的重要工具。高通量生物实验的结果在很大程度上依赖于其实验和数据分析过程中选择的操作因素。了解这些操作因素如何影响实验结果的可重复性对于选择最佳参数设置和设计可靠的高通量工作流程至关重要。然而,操作因素的影响在高通量实验中的强和弱候选物之间可能不同,这使得参数设置的选择变得复杂。为了解决这个问题,我们提出了一种新的分段回归模型,称为分段对应曲线回归,以评估操作因素对高通量实验可重复性的影响。我们的模型剖析了操作因素对强和弱候选物的异质性影响,为选择操作参数提供了一种原则性的方法。基于这个框架,我们还开发了一种用于检测异质性存在的超似然比检验。模拟研究表明,我们的估计和检验程序产生了良好校准的Ⅰ型错误,并且在检测和定位工作流程之间的可重复性差异方面比现有方法具有更大的功效。使用这个模型,我们研究了 ChIP-seq 实验中的一个重要设计问题:为了以经济有效的方式获得可靠的结果,一个序列应该测序多少个读长?我们的结果揭示了测序深度对结合位点识别可重复性的影响的新见解,帮助生物学家确定最具成本效益的测序深度,以实现其研究目标的足够可重复性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验