Suppr超能文献

用于表达谱分类的新特征子集选择程序。

New feature subset selection procedures for classification of expression profiles.

作者信息

Bø Trond, Jonassen Inge

机构信息

Department of Informatics, University of Bergen, N-5020 Bergen, Norway.

出版信息

Genome Biol. 2002;3(4):RESEARCH0017. doi: 10.1186/gb-2002-3-4-research0017. Epub 2002 Mar 14.

Abstract

BACKGROUND

Methods for extracting useful information from the datasets produced by microarray experiments are at present of much interest. Here we present new methods for finding gene sets that are well suited for distinguishing experiment classes, such as healthy versus diseased tissues. Our methods are based on evaluating genes in pairs and evaluating how well a pair in combination distinguishes two experiment classes. We tested the ability of our pair-based methods to select gene sets that generalize the differences between experiment classes and compared the performance relative to two standard methods. To assess the ability to generalize class differences, we studied how well the gene sets we select are suited for learning a classifier.

RESULTS

We show that the gene sets selected by our methods outperform the standard methods, in some cases by a large margin, in terms of cross-validation prediction accuracy of the learned classifier. We show that on two public datasets, accurate diagnoses can be made using only 15-30 genes. Our results have implications for how to select marker genes and how many gene measurements are needed for diagnostic purposes.

CONCLUSION

When looking for differential expression between experiment classes, it may not be sufficient to look at each gene in a separate universe. Evaluating combinations of genes reveals interesting information that will not be discovered otherwise. Our results show that class prediction can be improved by taking advantage of this extra information.

摘要

背景

目前,从微阵列实验产生的数据集中提取有用信息的方法备受关注。在此,我们提出了一些新方法,用于寻找非常适合区分实验类别(如健康组织与患病组织)的基因集。我们的方法基于对基因进行成对评估,并评估一对基因组合区分两个实验类别的能力。我们测试了基于成对的方法选择能够概括实验类别之间差异的基因集的能力,并将其性能与两种标准方法进行了比较。为了评估概括类别差异的能力,我们研究了我们选择的基因集在学习分类器方面的适用性。

结果

我们表明,就学习到的分类器的交叉验证预测准确性而言,我们的方法选择的基因集优于标准方法,在某些情况下优势明显。我们表明,在两个公共数据集上,仅使用15 - 30个基因就可以做出准确的诊断。我们的结果对于如何选择标记基因以及诊断需要进行多少基因测量具有启示意义。

结论

在寻找实验类别之间的差异表达时,单独考察每个基因可能并不足够。评估基因组合会揭示出用其他方式无法发现的有趣信息。我们的结果表明,利用这些额外信息可以提高类别预测能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/115205/d8805e93b8ce/gb-2002-3-4-research0017-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验