分段对应曲线回归定量分析协变量对高通量实验可重复性的影响。

Segmented correspondence curve regression for quantifying covariate effects on the reproducibility of high-throughput experiments.

机构信息

School of Economics and Finance, Xi'an Jiaotong University, Xi'an, China.

Department of Statistics, Pennsylvania State University, Pennsylvania, USA.

出版信息

Biometrics. 2023 Sep;79(3):2272-2285. doi: 10.1111/biom.13757. Epub 2022 Sep 19.

DOI:10.1111/biom.13757

PMID:36056911

Abstract

High-throughput biological experiments are essential tools for identifying biologically interesting candidates in large-scale omics studies. The results of a high-throughput biological experiment rely heavily on the operational factors chosen in its experimental and data-analytic procedures. Understanding how these operational factors influence the reproducibility of the experimental outcome is critical for selecting the optimal parameter settings and designing reliable high-throughput workflows. However, the influence of an operational factor may differ between strong and weak candidates in a high-throughput experiment, complicating the selection of parameter settings. To address this issue, we propose a novel segmented regression model, called segmented correspondence curve regression, to assess the influence of operational factors on the reproducibility of high-throughput experiments. Our model dissects the heterogeneous effects of operational factors on strong and weak candidates, providing a principled way to select operational parameters. Based on this framework, we also develop a sup-likelihood ratio test for the existence of heterogeneity. Simulation studies show that our estimation and testing procedures yield well-calibrated type I errors and are substantially more powerful in detecting and locating the differences in reproducibility across workflows than the existing method. Using this model, we investigated an important design question for ChIP-seq experiments: How many reads should one sequence to obtain reliable results in a cost-effective way? Our results reveal new insights into the impact of sequencing depth on the binding-site identification reproducibility, helping biologists determine the most cost-effective sequencing depth to achieve sufficient reproducibility for their study goals.

摘要

高通量生物实验是在大规模组学研究中识别有生物学意义的候选物的重要工具。高通量生物实验的结果在很大程度上依赖于其实验和数据分析过程中选择的操作因素。了解这些操作因素如何影响实验结果的可重复性对于选择最佳参数设置和设计可靠的高通量工作流程至关重要。然而，操作因素的影响在高通量实验中的强和弱候选物之间可能不同，这使得参数设置的选择变得复杂。为了解决这个问题，我们提出了一种新的分段回归模型，称为分段对应曲线回归，以评估操作因素对高通量实验可重复性的影响。我们的模型剖析了操作因素对强和弱候选物的异质性影响，为选择操作参数提供了一种原则性的方法。基于这个框架，我们还开发了一种用于检测异质性存在的超似然比检验。模拟研究表明，我们的估计和检验程序产生了良好校准的Ⅰ型错误，并且在检测和定位工作流程之间的可重复性差异方面比现有方法具有更大的功效。使用这个模型，我们研究了 ChIP-seq 实验中的一个重要设计问题：为了以经济有效的方式获得可靠的结果，一个序列应该测序多少个读长？我们的结果揭示了测序深度对结合位点识别可重复性的影响的新见解，帮助生物学家确定最具成本效益的测序深度，以实现其研究目标的足够可重复性。

相似文献

Segmented correspondence curve regression for quantifying covariate effects on the reproducibility of high-throughput experiments.分段对应曲线回归定量分析协变量对高通量实验可重复性的影响。

Biometrics. 2023 Sep;79(3):2272-2285. doi: 10.1111/biom.13757. Epub 2022 Sep 19.

The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.成年自闭症患者的就业生活经历：系统检索与综述

Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec.

Stigma Management Strategies of Autistic Social Media Users.自闭症社交媒体用户的污名管理策略

Autism Adulthood. 2025 May 28;7(3):273-282. doi: 10.1089/aut.2023.0095. eCollection 2025 Jun.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Effectiveness and cost-effectiveness of computer and other electronic aids for smoking cessation: a systematic review and network meta-analysis.计算机和其他电子戒烟辅助手段的有效性和成本效益：系统评价和网络荟萃分析。

Health Technol Assess. 2012;16(38):1-205, iii-v. doi: 10.3310/hta16380.

Measures implemented in the school setting to contain the COVID-19 pandemic.学校为控制 COVID-19 疫情而采取的措施。

Cochrane Database Syst Rev. 2022 Jan 17;1(1):CD015029. doi: 10.1002/14651858.CD015029.

Comparison of self-administered survey questionnaire responses collected using mobile apps versus other methods.使用移动应用程序与其他方法收集的自我管理调查问卷回复的比较。

Cochrane Database Syst Rev. 2015 Jul 27;2015(7):MR000042. doi: 10.1002/14651858.MR000042.pub2.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

引用本文的文献

Hypothesis test of arbitrary parametric structure in a generalized additive model.广义相加模型中任意参数结构的假设检验

medRxiv. 2025 May 13:2025.05.12.25327450. doi: 10.1101/2025.05.12.25327450.

Reproducibility of mass spectrometry based metabolomics data.基于质谱的代谢组学数据的可重复性。

BMC Bioinformatics. 2021 Sep 7;22(1):423. doi: 10.1186/s12859-021-04336-9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

分段对应曲线回归定量分析协变量对高通量实验可重复性的影响。

Segmented correspondence curve regression for quantifying covariate effects on the reproducibility of high-throughput experiments.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献