Ferrari Francesco, Bortoluzzi Stefania, Coppe Alessandro, Sirota Alexandra, Safran Marilyn, Shmoish Michael, Ferrari Sergio, Lancet Doron, Danieli Gian Antonio, Bicciato Silvio
Department of Biomedical Sciences, University of Modena and Reggio Emilia, via G. Campi 287, 41100, Modena, Italy.
BMC Bioinformatics. 2007 Nov 15;8:446. doi: 10.1186/1471-2105-8-446.
Improvements in genome sequence annotation revealed discrepancies in the original probeset/gene assignment in Affymetrix microarray and the existence of differences between annotations and effective alignments of probes and transcription products. In the current generation of Affymetrix human GeneChips, most probesets include probes matching transcripts from more than one gene and probes which do not match any transcribed sequence.
We developed a novel set of custom Chip Definition Files (CDF) and the corresponding Bioconductor libraries for Affymetrix human GeneChips, based on the information contained in the GeneAnnot database. GeneAnnot-based CDFs are composed of unique custom-probesets, including only probes matching a single gene.
GeneAnnot-based custom CDFs solve the problem of a reliable reconstruction of expression levels and eliminate the existence of more than one probeset per gene, which often leads to discordant expression signals for the same transcript when gene differential expression is the focus of the analysis. GeneAnnot CDFs are freely distributed and fully compliant with Affymetrix standards and all available software for gene expression analysis. The CDF libraries are available from http://www.xlab.unimo.it/GA_CDF, along with supplementary information (CDF libraries, installation guidelines and R code, CDF statistics, and analysis results).
基因组序列注释的改进揭示了Affymetrix微阵列中原始探针集/基因分配的差异,以及注释与探针和转录产物有效比对之间存在的差异。在当前一代的Affymetrix人类基因芯片中,大多数探针集包含与多个基因的转录本匹配的探针以及与任何转录序列都不匹配的探针。
我们基于GeneAnnot数据库中包含的信息,为Affymetrix人类基因芯片开发了一套新颖的定制芯片定义文件(CDF)及相应的Bioconductor库。基于GeneAnnot的CDF由独特的定制探针集组成,仅包含与单个基因匹配的探针。
基于GeneAnnot的定制CDF解决了可靠重建表达水平的问题,并消除了每个基因存在多个探针集的情况,当基因差异表达是分析重点时,这常常会导致同一转录本产生不一致的表达信号。基于GeneAnnot的CDF可免费获取,并且完全符合Affymetrix标准以及所有用于基因表达分析的可用软件。CDF库可从http://www.xlab.unimo.it/GA_CDF获取,同时还提供补充信息(CDF库、安装指南和R代码、CDF统计信息以及分析结果)。