Department of Electrical Engineering and Computer Science, University of Kansas, 1520 West 15th Street, Lawrence, KS 66045, USA.
Bioinformatics. 2009 Nov 15;25(22):2945-54. doi: 10.1093/bioinformatics/btp521. Epub 2009 Aug 31.
The sequencing of whole genomes from various species has provided us with a wealth of genetic information. To make use of the vast amounts of data available today it is necessary to devise computer-based analysis techniques.
We propose a Hidden Markov Model (HMM) based algorithm to detect groups of genes functionally similar to a set of input genes from microarray expression data. A subset of experiments from a microarray is selected based on a set of related input genes. HMMs are trained from the input genes and a group of random gene input sets to provide significance estimates. Every gene in the microarray is scored using all HMMs and significant matches with the input genes are retained. We ran this algorithm on the life cycle of Drosophila microarray data set with KEGG pathways for cell cycle and translation factors as input data sets. Results show high functional similarity in resulting gene sets, increasing our biological insight into gene pathways and KEGG annotations. The algorithm performed very well compared to the Signature Algorithm and a purely correlation-based approach.
Java source codes and data sets are available at http://www.ittc.ku.edu/~xwchen/software.htm
对来自不同物种的全基因组进行测序为我们提供了大量的遗传信息。为了利用当今可用的大量数据,有必要设计基于计算机的分析技术。
我们提出了一种基于隐马尔可夫模型(HMM)的算法,用于从微阵列表达数据中检测与一组输入基因在功能上相似的基因组。根据一组相关的输入基因,从微阵列中选择一组实验。从输入基因和一组随机基因输入集训练 HMM,以提供显著性估计。使用所有 HMM 对微阵列中的每个基因进行评分,并保留与输入基因的显著匹配。我们在果蝇微阵列数据集的生命周期上运行此算法,以细胞周期和翻译因子的 KEGG 途径作为输入数据集。结果表明,结果基因集具有很高的功能相似性,这增加了我们对基因途径和 KEGG 注释的生物学理解。与 Signature 算法和纯粹基于相关性的方法相比,该算法的性能非常出色。
Java 源代码和数据集可在 http://www.ittc.ku.edu/~xwchen/software.htm 上获得。