Suppr超能文献

从时间序列数据中学习布尔网络时,在样本复杂度和计算复杂度之间进行权衡。

A trade-off between sample complexity and computational complexity in learning Boolean networks from time-series data.

机构信息

McGill University, 3480 University St., Montreal, Quebec, Canada.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):118-25. doi: 10.1109/TCBB.2008.38.

Abstract

A key problem in molecular biology is to infer regulatory relationships between genes from expression data. This paper studies a simplified model of such inference problems in which one or more Boolean variables, modeling, for example, the expression levels of genes, each depend deterministically on a small but unknown subset of a large number of Boolean input variables. Our model assumes that the expression data comprises a time series, in which successive samples may be correlated. We provide bounds on the expected amount of data needed to infer the correct relationships between output and input variables. These bounds improve and generalize previous results for Boolean network inference and continuous-time switching network inference. Although the computational problem is intractable in general, we describe a fixed-parameter tractable algorithm that is guaranteed to provide at least a partial solution to the problem. Most interestingly, both the sample complexity and computational complexity of the problem depend on the strength of correlations between successive samples in the time series but in opposing ways. Uncorrelated samples minimize the total number of samples needed while maximizing computational complexity; a strong correlation between successive samples has the opposite effect. This observation has implications for the design of experiments for measuring gene expression.

摘要

分子生物学中的一个关键问题是从表达数据中推断基因之间的调控关系。本文研究了这种推断问题的简化模型,其中一个或多个布尔变量(例如,基因的表达水平),可以确定性地取决于大量布尔输入变量中的一小部分未知变量。我们的模型假设表达数据包含一个时间序列,其中连续的样本可能相关。我们提供了推断输出和输入变量之间正确关系所需的数据量的期望上限。这些上限改进并推广了以前用于布尔网络推断和连续时间切换网络推断的结果。尽管计算问题通常是不可解的,但我们描述了一种固定参数可解的算法,该算法保证为该问题提供至少部分解决方案。最有趣的是,问题的样本复杂度和计算复杂度都取决于时间序列中连续样本之间的相关性强度,但方式相反。无相关样本最小化了所需的总样本数量,同时最大化了计算复杂度;连续样本之间的强相关性则具有相反的效果。这一观察结果对测量基因表达的实验设计具有启示意义。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验