Suppr超能文献

基于密度的剪枝方法从基因表达微阵列数据中识别差异表达基因。

Density based pruning for identification of differentially expressed genes from microarray data.

机构信息

Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA.

出版信息

BMC Genomics. 2010 Nov 2;11 Suppl 2(Suppl 2):S3. doi: 10.1186/1471-2164-11-S2-S3.

Abstract

MOTIVATION

Identification of differentially expressed genes from microarray datasets is one of the most important analyses for microarray data mining. Popular algorithms such as statistical t-test rank genes based on a single statistics. The false positive rate of these methods can be improved by considering other features of differentially expressed genes.

RESULTS

We proposed a pattern recognition strategy for identifying differentially expressed genes. Genes are mapped to a two dimension feature space composed of average difference of gene expression and average expression levels. A density based pruning algorithm (DB Pruning) is developed to screen out potential differentially expressed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test, rank product, and fold change.

CONCLUSIONS

Density based pruning of non-differentially expressed genes is an effective method for enhancing statistical testing based algorithms for identifying differentially expressed genes. It improves t-test, rank product, and fold change by 11% to 50% in the numbers of identified true differentially expressed genes. The source code of DB pruning is freely available on our website http://mleg.cse.sc.edu/degprune.

摘要

动机

从微阵列数据集识别差异表达基因是微阵列数据挖掘中最重要的分析之一。流行的算法,如统计 t 检验,根据单一统计量对基因进行排序。通过考虑差异表达基因的其他特征,可以提高这些方法的假阳性率。

结果

我们提出了一种用于识别差异表达基因的模式识别策略。基因被映射到由基因表达的平均差异和平均表达水平组成的二维特征空间。开发了一种基于密度的剪枝算法(DB 剪枝)来筛选通常位于稀疏边界区域的潜在差异表达基因。可视化地描述了识别差异表达基因的常用算法的偏差。在具有实验验证的差异表达基因的来自 Gene Omnibus Database (GEO) 的 17 个数据集上的实验表明,DB 剪枝可以显著提高 t 检验、秩乘积和倍数变化等常用识别算法的预测准确性。

结论

非差异表达基因的基于密度的剪枝是增强基于统计检验的识别差异表达基因算法的有效方法。它将 t 检验、秩乘积和倍数变化的真实差异表达基因的识别数量提高了 11%至 50%。DB 剪枝的源代码可在我们的网站 http://mleg.cse.sc.edu/degprune 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8f0a/2975422/9c07f85606ad/1471-2164-11-S2-S3-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验