Interdisciplinary Ph.D. Program in Biostatistics, The Ohio State University, Columbus, OH, United States of America.
Department of Statistics, The Ohio State University, Columbus, OH, United States of America.
PLoS One. 2024 Jan 17;19(1):e0287521. doi: 10.1371/journal.pone.0287521. eCollection 2024.
The ability to simulate high-throughput data with high fidelity to real experimental data is fundamental for benchmarking methods used to detect true long-range chromatin interactions mediated by a specific protein. Yet, such tools are not currently available. To fill this gap, we develop an in silico experimental procedure, ChIA-Sim, which imitates the experimental procedures that produce real ChIA-PET, Hi-ChIP, or PLAC-seq data. We show the fidelity of ChIA-Sim to real data by using guiding characteristics of several real datasets to generate data using the simulation procedure. We also used ChIA-Sim data to demonstrate the use of our in silico procedure in benchmarking methods for significant interactions analysis by evaluating four methods for significant interaction calling (SIC). In particular, we assessed each method's performance in terms of correct identification of long-range interactions. We further analyzed four experimental datasets from publicly available databases and shew that the trend of the results are consistent with those seen in data generated from ChIA-Sim. This serves as additional evidence that ChIA-Sim closely resembles data produced from the experimental protocols it models after.
模拟具有与真实实验数据高度一致性的高通量数据的能力对于基准测试用于检测特定蛋白质介导的真实长程染色质相互作用的方法是至关重要的。然而,目前还没有这样的工具。为了填补这一空白,我们开发了一种计算实验程序 ChIA-Sim,它模拟了产生真实 ChIA-PET、Hi-ChIP 或 PLAC-seq 数据的实验过程。我们通过使用几个真实数据集的引导特征来生成使用模拟程序生成的数据,展示了 ChIA-Sim 对真实数据的保真度。我们还使用 ChIA-Sim 数据来演示我们的计算程序在基准测试用于显著相互作用分析的方法中的用途,通过评估四种用于显著相互作用调用(SIC)的方法来评估。特别是,我们根据正确识别长程相互作用的能力来评估每种方法的性能。我们进一步分析了来自公共数据库的四个实验数据集,并表明结果的趋势与从 ChIA-Sim 生成的数据中看到的趋势一致。这进一步证明了 ChIA-Sim 非常类似于它所模拟的实验方案产生的数据。