Suppr超能文献

一种用于Affymetrix基因芯片阵列的无分布汇总方法。

A distribution free summarization method for Affymetrix GeneChip arrays.

作者信息

Chen Zhongxue, McGee Monnie, Liu Qingzhong, Scheuermann Richard H

机构信息

Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, USA.

出版信息

Bioinformatics. 2007 Feb 1;23(3):321-7. doi: 10.1093/bioinformatics/btl609. Epub 2006 Dec 5.

Abstract

MOTIVATION

Affymetrix GeneChip arrays require summarization in order to combine the probe-level intensities into one value representing the expression level of a gene. However, probe intensity measurements are expected to be affected by different levels of non-specific- and cross-hybridization to non-specific transcripts. Here, we present a new summarization technique, the Distribution Free Weighted method (DFW), which uses information about the variability in probe behavior to estimate the extent of non-specific and cross-hybridization for each probe. The contribution of the probe is weighted accordingly during summarization, without making any distributional assumptions for the probe-level data.

RESULTS

We compare DFW with several popular summarization methods on spike-in datasets, via both our own calculations and the 'Affycomp II' competition. The results show that DFW outperforms other methods when sensitivity and specificity are considered simultaneously. With the Affycomp spike-in datasets, the area under the receiver operating characteristic curve for DFW is nearly 1.0 (a perfect value), indicating that DFW can identify all differentially expressed genes with a few false positives. The approach used is also computationally faster than most other methods in current use.

AVAILABILITY

The R code for DFW is available upon request.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

Affymetrix基因芯片阵列需要进行数据汇总,以便将探针水平的强度合并为一个代表基因表达水平的值。然而,探针强度测量预计会受到与非特异性转录本不同程度的非特异性杂交和交叉杂交的影响。在此,我们提出一种新的数据汇总技术,即无分布加权法(DFW),该方法利用探针行为变异性的信息来估计每个探针的非特异性杂交和交叉杂交程度。在汇总过程中,会相应地对探针的贡献进行加权,而无需对探针水平的数据做出任何分布假设。

结果

我们通过自己的计算以及“Affycomp II”竞赛,在掺入数据集上比较了DFW与几种流行的数据汇总方法。结果表明,在同时考虑敏感性和特异性时,DFW优于其他方法。对于Affycomp掺入数据集,DFW的受试者工作特征曲线下面积接近1.0(完美值),表明DFW可以识别所有差异表达基因,且假阳性较少。所使用的方法在计算上也比当前使用的大多数其他方法更快。

可用性

可根据要求提供DFW的R代码。

补充信息

补充数据可在《生物信息学》在线获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验