Suppr超能文献

从微阵列数据预测具有生物学意义的成分:独立一致表达鉴别器(ICED)。

Prediction of biologically significant components from microarray data: Independently Consistent Expression Discriminator (ICED).

作者信息

Bijlani Rahul, Cheng Yinhe, Pearce David A, Brooks Andrew I, Ogihara Mitsunori

机构信息

Department of Computer Science, University of Rochester School of Medicine and Dentistry, NY 14642, USA.

出版信息

Bioinformatics. 2003 Jan;19(1):62-70. doi: 10.1093/bioinformatics/19.1.62.

Abstract

MOTIVATION

Class distinction is a supervised learning approach that has been successfully employed in the analysis of high-throughput gene expression data. Identification of a set of genes that predicts differential biological states allows for the development of basic and clinical scientific approaches to the diagnosis of disease. The Independent Consistent Expression Discriminator (ICED) was designed to provide a more biologically relevant search criterion during predictor selection by embracing the inherent variability of gene expression in any biological state. The four components of ICED include (i) normalization of raw data; (ii) assignment of weights to genes from both classes; (iii) counting of votes to determine optimal number of predictor genes for class distinction; (iv) calculation of prediction strengths for classification results. The search criteria employed by ICED is designed to identify not only genes that are consistently expressed at one level in one class and at a consistently different level in another class but identify genes that are variable in one class and consistent in another. The result is a novel approach to accurately select biologically relevant predictors of differential disease states from a small number of microarray samples.

RESULTS

The data described herein utilized ICED to analyze the large AML/ALL training and test data set (Golub et al., 1999, Science, 286, 531-537) in addition to a smaller data set consisting of an animal model of the childhood neurodegenerative disorder, Batten disease, generated for this study. Both of the analyses presented herein have correctly predicted biologically relevant perturbations that can be used for disease classification, irrespective of sample size. Furthermore, the results have provided candidate proteins for future study in understanding the disease process and the identification of potential targets for therapeutic intervention.

摘要

动机

类别区分是一种监督学习方法,已成功应用于高通量基因表达数据分析。识别一组能够预测不同生物学状态的基因有助于开发用于疾病诊断的基础和临床科学方法。独立一致表达判别器(ICED)旨在通过考虑任何生物学状态下基因表达的固有变异性,在预测器选择过程中提供更具生物学相关性的搜索标准。ICED的四个组成部分包括:(i)原始数据的归一化;(ii)给两类基因分配权重;(iii)计算票数以确定用于类别区分的预测基因的最佳数量;(iv)计算分类结果的预测强度。ICED采用的搜索标准旨在不仅识别在一类中始终在一个水平表达而在另一类中始终在不同水平表达的基因,而且识别在一类中可变而在另一类中一致的基因。结果是一种从少量微阵列样本中准确选择与疾病差异状态具有生物学相关性的预测器的新方法。

结果

本文所述数据利用ICED分析了大型急性髓细胞白血病/急性淋巴细胞白血病训练和测试数据集(Golub等人,1999年,《科学》,286卷,531 - 537页),此外还分析了为本研究生成的一个较小的数据集,该数据集由儿童神经退行性疾病巴滕病的动物模型组成。本文呈现的两项分析均正确预测了可用于疾病分类的生物学相关扰动,而与样本大小无关。此外,结果为未来研究疾病过程和识别治疗干预的潜在靶点提供了候选蛋白质。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验