Suppr超能文献

探索基因表达微阵列数据中的相关性,以进行最大预测-最小冗余生物标志物选择和分类。

Exploring correlations in gene expression microarray data for maximum predictive-minimum redundancy biomarker selection and classification.

机构信息

Department of Statistics, Operational Research and Numerical Analysis, University Nacional Educación a Distancia (UNED), Paseo Senda del Rey 9, 28040 Madrid, Spain.

出版信息

Comput Biol Med. 2013 Oct;43(10):1437-43. doi: 10.1016/j.compbiomed.2013.07.005. Epub 2013 Jul 13.

Abstract

An important issue in the analysis of gene expression microarray data is concerned with the extraction of valuable genetic interactions from high dimensional data sets containing gene expression levels collected for a small sample of assays. Past and ongoing research efforts have been focused on biomarker selection for phenotype classification. Usually, many genes convey useless information for classifying the outcome and should be removed from the analysis; on the other hand, some of them may be highly correlated, which reveals the presence of redundant expressed information. In this paper we propose a method for the selection of highly predictive genes having a low redundancy in their expression levels. The predictive accuracy of the selection is assessed by means of Classification and Regression Trees (CART) models which enable assessment of the performance of the selected genes for classifying the outcome variable and will also uncover complex genetic interactions. The method is illustrated throughout the paper using a public domain colon cancer gene expression data set.

摘要

基因表达微阵列数据分析中的一个重要问题涉及从包含针对小样本测定收集的基因表达水平的高维数据集提取有价值的遗传相互作用。过去和正在进行的研究工作都集中在生物标志物的选择用于表型分类。通常,许多基因对于分类结果传递无用的信息,应该从分析中删除;另一方面,其中一些可能高度相关,这表明存在冗余表达的信息。在本文中,我们提出了一种从其表达水平中具有低冗余度的高度预测基因中选择的方法。通过分类和回归树 (CART) 模型评估选择的预测准确性,该模型能够评估所选基因用于分类结果变量的性能,并揭示复杂的遗传相互作用。该方法使用公共领域的结肠癌基因表达数据集在整篇文章中进行说明。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验