预测最小描述长度原理方法推断基因调控网络。

Predictive minimum description length principle approach to inferring gene regulatory networks.

机构信息

School of Computing, The University of Southern Mississippi, MS 39402, USA.

出版信息

Adv Exp Med Biol. 2011;696:37-43. doi: 10.1007/978-1-4419-7046-6_4.

Abstract

Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold that defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we propose a new inference algorithm that incorporates mutual information (MI), conditional mutual information (CMI), and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm is evaluated using both synthetic time series data sets and a biological time series data set (Saccharomyces cerevisiae). The results show that the proposed algorithm produced fewer false edges and significantly improved the precision when compared to existing MDL algorithm.

摘要

使用信息论模型进行基因调控网络的反向工程由于其简单性、低计算成本和推断大型网络的能力而受到广泛关注。信息论模型的一个主要问题是确定定义基因之间调控关系的阈值。最小描述长度 (MDL) 原理已被实施以克服此问题。MDL 原理的描述长度是模型长度和数据编码长度的总和。用户指定的微调参数用作模型和数据编码之间的控制机制，但很难找到最佳参数。在这项工作中，我们提出了一种新的推理算法，该算法将互信息 (MI)、条件互信息 (CMI) 和预测最小描述长度 (PMDL) 原理结合起来，从 DNA 微阵列数据中推断基因调控网络。在该算法中，信息论量 MI 和 CMI 确定基因之间的调控关系，PMDL 原理方法试图在不需要用户指定微调参数的情况下确定最佳 MI 阈值。使用合成时间序列数据集和生物时间序列数据集 (酿酒酵母) 评估了所提出算法的性能。结果表明，与现有的 MDL 算法相比，所提出的算法产生的假边更少，并且显著提高了精度。