Suppr超能文献

对推理方法的时间序列数据离散化进行基准测试。

Benchmarking time-series data discretization on inference methods.

机构信息

R.D. Berlin Center for Cell Analysis and Modeling, University of Connecticut School of Medicine, Farmington, CT, USA.

Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA.

出版信息

Bioinformatics. 2019 Sep 1;35(17):3102-3109. doi: 10.1093/bioinformatics/btz036.

Abstract

SUMMARY

The rapid development in quantitatively measuring DNA, RNA and protein has generated a great interest in the development of reverse-engineering methods, that is, data-driven approaches to infer the network structure or dynamical model of the system. Many reverse-engineering methods require discrete quantitative data as input, while many experimental data are continuous. Some studies have started to reveal the impact that the choice of data discretization has on the performance of reverse-engineering methods. However, more comprehensive studies are still greatly needed to systematically and quantitatively understand the impact that discretization methods have on inference methods. Furthermore, there is an urgent need for systematic comparative methods that can help select between discretization methods. In this work, we consider four published intracellular networks inferred with their respective time-series datasets. We discretized the data using different discretization methods. Across all datasets, changing the data discretization to a more appropriate one improved the reverse-engineering methods' performance. We observed no universal best discretization method across different time-series datasets. Thus, we propose DiscreeTest, a two-step evaluation metric for ranking discretization methods for time-series data. The underlying assumption of DiscreeTest is that an optimal discretization method should preserve the dynamic patterns observed in the original data across all variables. We used the same datasets and networks to show that DiscreeTest is able to identify an appropriate discretization among several candidate methods. To our knowledge, this is the first time that a method for benchmarking and selecting an appropriate discretization method for time-series data has been proposed.

AVAILABILITY AND IMPLEMENTATION

All the datasets, reverse-engineering methods and source code used in this paper are available in Vera-Licona's lab Github repository: https://github.com/VeraLiconaResearchGroup/Benchmarking_TSDiscretizations.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

摘要

定量测量 DNA、RNA 和蛋白质的快速发展,极大地激发了人们对逆向工程方法的开发兴趣,即通过数据驱动的方法来推断系统的网络结构或动态模型。许多逆向工程方法需要离散的定量数据作为输入,而许多实验数据是连续的。一些研究已经开始揭示数据离散化选择对逆向工程方法性能的影响。然而,仍然需要更全面的研究来系统和定量地了解离散化方法对推理方法的影响。此外,迫切需要系统的比较方法来帮助选择离散化方法。在这项工作中,我们考虑了四个用各自的时间序列数据集推断出的细胞内网络。我们使用不同的离散化方法对数据进行了离散化。在所有数据集上,将数据离散化到更合适的方法可以提高逆向工程方法的性能。我们没有观察到不同时间序列数据集之间存在通用的最佳离散化方法。因此,我们提出了 DiscreeTest,这是一种用于对时间序列数据的离散化方法进行排名的两步评估指标。DiscreeTest 的基本假设是,一个最优的离散化方法应该在所有变量中保留原始数据中观察到的动态模式。我们使用相同的数据集和网络表明,DiscreeTest 能够在几个候选方法中识别出合适的离散化方法。据我们所知,这是首次提出用于基准测试和选择时间序列数据的适当离散化方法的方法。

可用性和实现

本文中使用的所有数据集、逆向工程方法和源代码都可在 Vera-Licona 的实验室 Github 存储库中获得:https://github.com/VeraLiconaResearchGroup/Benchmarking_TSDiscretizations。

补充信息

补充数据可在《生物信息学》在线获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验