对推理方法的时间序列数据离散化进行基准测试。

R.D. Berlin Center for Cell Analysis and Modeling, University of Connecticut School of Medicine, Farmington, CT, USA.

Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.

Bioinformatics. 2019 Sep 1;35(17):3102-3109. doi: 10.1093/bioinformatics/btz036.

SUMMARY

The rapid development in quantitatively measuring DNA, RNA and protein has generated a great interest in the development of reverse-engineering methods, that is, data-driven approaches to infer the network structure or dynamical model of the system. Many reverse-engineering methods require discrete quantitative data as input, while many experimental data are continuous. Some studies have started to reveal the impact that the choice of data discretization has on the performance of reverse-engineering methods. However, more comprehensive studies are still greatly needed to systematically and quantitatively understand the impact that discretization methods have on inference methods. Furthermore, there is an urgent need for systematic comparative methods that can help select between discretization methods. In this work, we consider four published intracellular networks inferred with their respective time-series datasets. We discretized the data using different discretization methods. Across all datasets, changing the data discretization to a more appropriate one improved the reverse-engineering methods' performance. We observed no universal best discretization method across different time-series datasets. Thus, we propose DiscreeTest, a two-step evaluation metric for ranking discretization methods for time-series data. The underlying assumption of DiscreeTest is that an optimal discretization method should preserve the dynamic patterns observed in the original data across all variables. We used the same datasets and networks to show that DiscreeTest is able to identify an appropriate discretization among several candidate methods. To our knowledge, this is the first time that a method for benchmarking and selecting an appropriate discretization method for time-series data has been proposed.

AVAILABILITY AND IMPLEMENTATION

All the datasets, reverse-engineering methods and source code used in this paper are available in Vera-Licona's lab Github repository: https://github.com/VeraLiconaResearchGroup/Benchmarking_TSDiscretizations.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

定量测量 DNA、RNA 和蛋白质的快速发展，极大地激发了人们对逆向工程方法的开发兴趣，即通过数据驱动的方法来推断系统的网络结构或动态模型。许多逆向工程方法需要离散的定量数据作为输入，而许多实验数据是连续的。一些研究已经开始揭示数据离散化选择对逆向工程方法性能的影响。然而，仍然需要更全面的研究来系统和定量地了解离散化方法对推理方法的影响。此外，迫切需要系统的比较方法来帮助选择离散化方法。在这项工作中，我们考虑了四个用各自的时间序列数据集推断出的细胞内网络。我们使用不同的离散化方法对数据进行了离散化。在所有数据集上，将数据离散化到更合适的方法可以提高逆向工程方法的性能。我们没有观察到不同时间序列数据集之间存在通用的最佳离散化方法。因此，我们提出了 DiscreeTest，这是一种用于对时间序列数据的离散化方法进行排名的两步评估指标。DiscreeTest 的基本假设是，一个最优的离散化方法应该在所有变量中保留原始数据中观察到的动态模式。我们使用相同的数据集和网络表明，DiscreeTest 能够在几个候选方法中识别出合适的离散化方法。据我们所知，这是首次提出用于基准测试和选择时间序列数据的适当离散化方法的方法。

可用性和实现

本文中使用的所有数据集、逆向工程方法和源代码都可在 Vera-Licona 的实验室 Github 存储库中获得：https://github.com/VeraLiconaResearchGroup/Benchmarking_TSDiscretizations。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

Benchmarking time-series data discretization on inference methods.

Bioinformatics. 2019 Sep 1;35(17):3102-3109. doi: 10.1093/bioinformatics/btz036.

Discretization of time series data.

J Comput Biol. 2010 Jun;17(6):853-68. doi: 10.1089/cmb.2008.0023.

MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data.

BMC Syst Biol. 2018 Dec 14;12(Suppl 7):115. doi: 10.1186/s12918-018-0635-1.

Comparative study of discretization methods of microarray data for inferring transcriptional regulatory networks.

BMC Bioinformatics. 2010 Oct 19;11:520. doi: 10.1186/1471-2105-11-520.

Topological benchmarking of algorithms to infer gene regulatory networks from single-cell RNA-seq data.

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae267.

Network inference performance complexity: a consequence of topological, experimental and algorithmic determinants.

Bioinformatics. 2019 Sep 15;35(18):3421-3432. doi: 10.1093/bioinformatics/btz105.

A neuro-evolution approach to infer a Boolean network from time-series gene expressions.

Bioinformatics. 2020 Dec 30;36(Suppl_2):i762-i769. doi: 10.1093/bioinformatics/btaa840.

GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

Bioinformatics. 2011 Aug 15;27(16):2263-70. doi: 10.1093/bioinformatics/btr373. Epub 2011 Jun 22.

OpenBioLink: a benchmarking framework for large-scale biomedical link prediction.

Bioinformatics. 2020 Jul 1;36(13):4097-4098. doi: 10.1093/bioinformatics/btaa274.

HiDi: an efficient reverse engineering schema for large-scale dynamic regulatory network reconstruction using adaptive differentiation.

Bioinformatics. 2017 Dec 15;33(24):3964-3972. doi: 10.1093/bioinformatics/btx501.

引用本文的文献

scBoolSeq: Linking scRNA-seq statistics and Boolean dynamics.

PLoS Comput Biol. 2024 Jul 8;20(7):e1011620. doi: 10.1371/journal.pcbi.1011620. eCollection 2024 Jul.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Benchmarking time-series data discretization on inference methods.

Bioinformatics. 2019 Sep 1;35(17):3102-3109. doi: 10.1093/bioinformatics/btz036.

Discretization of time series data.

J Comput Biol. 2010 Jun;17(6):853-68. doi: 10.1089/cmb.2008.0023.

MICRAT: a novel algorithm for inferring gene regulatory networks using time series gene expression data.

BMC Syst Biol. 2018 Dec 14;12(Suppl 7):115. doi: 10.1186/s12918-018-0635-1.

Comparative study of discretization methods of microarray data for inferring transcriptional regulatory networks.

BMC Bioinformatics. 2010 Oct 19;11:520. doi: 10.1186/1471-2105-11-520.

Topological benchmarking of algorithms to infer gene regulatory networks from single-cell RNA-seq data.

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae267.

Network inference performance complexity: a consequence of topological, experimental and algorithmic determinants.

Bioinformatics. 2019 Sep 15;35(18):3421-3432. doi: 10.1093/bioinformatics/btz105.

A neuro-evolution approach to infer a Boolean network from time-series gene expressions.

Bioinformatics. 2020 Dec 30;36(Suppl_2):i762-i769. doi: 10.1093/bioinformatics/btaa840.

GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods.

Bioinformatics. 2011 Aug 15;27(16):2263-70. doi: 10.1093/bioinformatics/btr373. Epub 2011 Jun 22.

OpenBioLink: a benchmarking framework for large-scale biomedical link prediction.

Bioinformatics. 2020 Jul 1;36(13):4097-4098. doi: 10.1093/bioinformatics/btaa274.

HiDi: an efficient reverse engineering schema for large-scale dynamic regulatory network reconstruction using adaptive differentiation.

Bioinformatics. 2017 Dec 15;33(24):3964-3972. doi: 10.1093/bioinformatics/btx501.

引用本文的文献

scBoolSeq: Linking scRNA-seq statistics and Boolean dynamics.

PLoS Comput Biol. 2024 Jul 8;20(7):e1011620. doi: 10.1371/journal.pcbi.1011620. eCollection 2024 Jul.

Benchmarking time-series data discretization on inference methods.

机构信息

出版信息

SUMMARY

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

摘要

可用性和实现

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献