Nykter Matti, Aho Tommi, Ahdesmäki Miika, Ruusuvuori Pekka, Lehmussola Antti, Yli-Harja Olli
Institute of Signal Processing, Tampere University of Technology, Tampere, Finland.
BMC Bioinformatics. 2006 Jul 18;7:349. doi: 10.1186/1471-2105-7-349.
Microarray technologies have become common tools in biological research. As a result, a need for effective computational methods for data analysis has emerged. Numerous different algorithms have been proposed for analyzing the data. However, an objective evaluation of the proposed algorithms is not possible due to the lack of biological ground truth information. To overcome this fundamental problem, the use of simulated microarray data for algorithm validation has been proposed.
We present a microarray simulation model which can be used to validate different kinds of data analysis algorithms. The proposed model is unique in the sense that it includes all the steps that affect the quality of real microarray data. These steps include the simulation of biological ground truth data, applying biological and measurement technology specific error models, and finally simulating the microarray slide manufacturing and hybridization. After all these steps are taken into account, the simulated data has realistic biological and statistical characteristics. The applicability of the proposed model is demonstrated by several examples.
The proposed microarray simulation model is modular and can be used in different kinds of applications. It includes several error models that have been proposed earlier and it can be used with different types of input data. The model can be used to simulate both spotted two-channel and oligonucleotide based single-channel microarrays. All this makes the model a valuable tool for example in validation of data analysis algorithms.
微阵列技术已成为生物学研究中的常用工具。因此,出现了对用于数据分析的有效计算方法的需求。已经提出了许多不同的算法来分析数据。然而,由于缺乏生物学真值信息,无法对所提出的算法进行客观评估。为了克服这个基本问题,有人提出使用模拟微阵列数据进行算法验证。
我们提出了一种微阵列模拟模型,可用于验证不同类型的数据分析算法。所提出的模型的独特之处在于它包含了影响真实微阵列数据质量的所有步骤。这些步骤包括模拟生物学真值数据、应用特定于生物学和测量技术的误差模型,以及最后模拟微阵列载玻片制造和杂交。在考虑了所有这些步骤之后,模拟数据具有现实的生物学和统计特征。通过几个例子证明了所提出模型的适用性。
所提出的微阵列模拟模型是模块化的,可用于不同类型的应用。它包括几个先前提出的误差模型,并且可以与不同类型的输入数据一起使用。该模型可用于模拟点阵式双通道和基于寡核苷酸的单通道微阵列。所有这些使得该模型成为例如在验证数据分析算法方面的有价值工具。