Angelini Claudia, De Canditiis Daniela, Mutarelli Margherita, Pensky Marianna
Istituto per le Applicazioni del Calcolo.
Stat Appl Genet Mol Biol. 2007;6:Article24. doi: 10.2202/1544-6115.1299. Epub 2007 Sep 16.
The objective of the present paper is to develop a truly functional Bayesian method specifically designed for time series microarray data. The method allows one to identify differentially expressed genes in a time-course microarray experiment, to rank them and to estimate their expression profiles. Each gene expression profile is modeled as an expansion over some orthonormal basis, where the coefficients and the number of basis functions are estimated from the data. The proposed procedure deals successfully with various technical difficulties that arise in typical microarray experiments such as a small number of observations, non-uniform sampling intervals and missing or replicated data. The procedure allows one to account for various types of errors and offers a good compromise between nonparametric techniques and techniques based on normality assumptions. In addition, all evaluations are performed using analytic expressions, so the entire procedure requires very small computational effort. The procedure is studied using both simulated and real data, and is compared with competitive recent approaches. Finally, the procedure is applied to a case study of a human breast cancer cell line stimulated with estrogen. We succeeded in finding new significant genes that were not marked in an earlier work on the same dataset.
本文的目的是开发一种专门为时间序列微阵列数据设计的真正实用的贝叶斯方法。该方法能够在时间进程微阵列实验中识别差异表达基因,对其进行排名并估计其表达谱。每个基因表达谱被建模为在某个正交基上的展开,其中系数和基函数的数量从数据中估计得出。所提出的程序成功地处理了典型微阵列实验中出现的各种技术难题,如观测值数量少、采样间隔不均匀以及数据缺失或重复等问题。该程序允许考虑各种类型的误差,并在非参数技术和基于正态性假设的技术之间提供了良好的折衷方案。此外,所有评估均使用解析表达式进行,因此整个程序所需的计算量非常小。使用模拟数据和真实数据对该程序进行了研究,并与近期的竞争方法进行了比较。最后,将该程序应用于雌激素刺激的人乳腺癌细胞系的案例研究。我们成功地发现了在同一数据集的早期工作中未被标记的新的重要基因。