Mendes Pedro, Sha Wei, Ye Keying
Virginia Bioinformatics Institute, USA.
Bioinformatics. 2003 Oct;19 Suppl 2:ii122-9. doi: 10.1093/bioinformatics/btg1069.
Large-scale gene expression profiling generates data sets that are rich in observed features but poor in numbers of observations. The analysis of such data sets is a challenge that has been object of vigorous research. The algorithms in use for this purpose have been poorly documented and rarely compared objectively, posing a problem of uncertainty about the outcomes of the analyses. One way to objectively test such analysis algorithms is to apply them on computational gene network models for which the mechanisms are completely know.
We present a system that generates random artificial gene networks according to well-defined topological and kinetic properties. These are used to run in silico experiments simulating real laboratory microarray experiments. Noise with controlled properties is added to the simulation results several times emulating measurement replicates, before expression ratios are calculated.
The data sets and kinetic models described here are available from http://www.vbi.vt.edu/~mendes/AGN/as biochemical dynamic models in SBML and Gepasi formats.
大规模基因表达谱分析生成的数据集具有丰富的观测特征,但观测数量较少。对此类数据集的分析是一项挑战,一直是活跃研究的对象。用于此目的的算法记录不完善,很少进行客观比较,这给分析结果带来了不确定性问题。客观测试此类分析算法的一种方法是将它们应用于机制完全已知的计算基因网络模型。
我们提出了一个系统,该系统根据定义明确的拓扑和动力学特性生成随机人工基因网络。这些网络用于进行模拟真实实验室微阵列实验的计算机模拟实验。在计算表达比率之前,多次向模拟结果添加具有受控特性的噪声,以模拟测量重复。
此处描述的数据集和动力学模型可从http://www.vbi.vt.edu/~mendes/AGN/获得,其格式为SBML和Gepasi的生化动力学模型。