Department of Electrical and Computer Engineering, Texas A&M University; Computational Biology Division, Translational Genomics Research Institute; Department of Pathology, University of Texas M.D. Anderson Cancer Center, USA.
Curr Genomics. 2007 Sep;8(6):351-9. doi: 10.2174/138920207783406505.
The availability of high-throughput genomic data has motivated the development of numerous algorithms to infer gene regulatory networks. The validity of an inference procedure must be evaluated relative to its ability to infer a model network close to the ground-truth network from which the data have been generated. The input to an inference algorithm is a sample set of data and its output is a network. Since input, output, and algorithm are mathematical structures, the validity of an inference algorithm is a mathematical issue. This paper formulates validation in terms of a semi-metric distance between two networks, or the distance between two structures of the same kind deduced from the networks, such as their steady-state distributions or regulatory graphs. The paper sets up the validation framework, provides examples of distance functions, and applies them to some discrete Markov network models. It also considers approximate validation methods based on data for which the generating network is not known, the kind of situation one faces when using real data.
高通量基因组数据的出现,推动了众多推断基因调控网络算法的发展。推断程序的有效性必须根据其从生成数据的真实网络推断出接近真实网络的模型网络的能力来评估。推断算法的输入是一组样本数据,输出是一个网络。由于输入、输出和算法都是数学结构,因此推断算法的有效性是一个数学问题。本文通过两个网络之间的半度量距离(或从网络推断出的同类型结构之间的距离)来定义验证,例如它们的稳态分布或调节图。本文建立了验证框架,提供了距离函数的示例,并将其应用于一些离散马尔可夫网络模型。它还考虑了基于未知生成网络的数据的近似验证方法,这是在使用真实数据时会遇到的情况。