Hvidsten Torgeir R, Laegreid Astrid, Komorowski Jan
Department of Computer and Information Science, Norwegian University of Science and Technology, N-7491 Trondheim, Norway.
Bioinformatics. 2003 Jun 12;19(9):1116-23. doi: 10.1093/bioinformatics/btg047.
Microarray technology enables large-scale inference of the participation of genes in biological process from similar expression profiles. Our aim is to induce classificatory models from expression data and biological knowledge that can automatically associate genes with novel hypotheses of biological process.
We report a systematic supervised learning approach to predicting biological process from time series of gene expression data and biological knowledge. Biological knowledge is expressed using gene ontology and this knowledge is associated with discriminatory expression-based features to form minimal decision rules. The resulting rule model is first evaluated on genes coding for proteins with known biological process roles using cross validation. Then it is used to generate hypotheses for genes for which no knowledge of participation in biological process could be found. The theoretical foundation for the methodology based on rough sets is outlined in the paper, and its practical application demonstrated on a data set previously published by Cho et al. (Nat. Genet., 27, 48-54, 2001).
The Rosetta system is available at http://www.idi.ntnu.no/~aleks/rosetta.
微阵列技术能够从相似的表达谱大规模推断基因在生物过程中的参与情况。我们的目标是从表达数据和生物知识中诱导出分类模型,该模型能够自动将基因与生物过程的新假设相关联。
我们报告了一种系统的监督学习方法,用于从基因表达数据的时间序列和生物知识预测生物过程。生物知识使用基因本体来表达,并且该知识与基于差异表达的特征相关联,以形成最小决策规则。首先使用交叉验证对所得的规则模型在编码具有已知生物过程作用的蛋白质的基因上进行评估。然后,它被用于为那些在生物过程参与方面没有相关知识的基因生成假设。本文概述了基于粗糙集的该方法的理论基础,并在Cho等人(《自然遗传学》,27卷,48 - 54页,2001年)先前发表的一个数据集上展示了其实际应用。
Rosetta系统可在http://www.idi.ntnu.no/~aleks/rosetta获取。