Department of Plant Science, Forestry and Agricultural Biotechnology Institute (FABI), University of Pretoria, Private Bag X20, 0028, South Africa.
Plant Methods. 2011 Oct 1;7:31. doi: 10.1186/1746-4811-7-31.
Microarray technology has matured over the past fifteen years into a cost-effective solution with established data analysis protocols for global gene expression profiling. The Agilent-016047 maize 44 K microarray was custom-designed from EST sequences, but only reporter sequences with EST accession numbers are publicly available. The following information is lacking: (a) reporter - gene model match, (b) number of reporters per gene model, (c) potential for cross hybridization, (d) sense/antisense orientation of reporters, (e) position of reporter on B73 genome sequence (for eQTL studies), and (f) functional annotations of genes represented by reporters. To address this, we developed a strategy to annotate the Agilent-016047 maize microarray, and built a publicly accessible annotation database.
Genomic annotation of the 42,034 reporters on the Agilent-016047 maize microarray was based on BLASTN results of the 60-mer reporter sequences and their corresponding ESTs against the maize B73 RefGen v2 "Working Gene Set" (WGS) predicted transcripts and the genome sequence. The agreement between the EST, WGS transcript and gDNA BLASTN results were used to assign the reporters into six genomic annotation groups. These annotation groups were: (i) "annotation by sense gene model" (23,668 reporters), (ii) "annotation by antisense gene model" (4,330); (iii) "annotation by gDNA" without a WGS transcript hit (1,549); (iv) "annotation by EST", in which case the EST from which the reporter was designed, but not the reporter itself, has a WGS transcript hit (3,390); (v) "ambiguous annotation" (2,608); and (vi) "inconclusive annotation" (6,489). Functional annotations of reporters were obtained by BLASTX and Blast2GO analysis of corresponding WGS transcripts against GenBank.The annotations are available in the Maize Microarray Annotation Database http://MaizeArrayAnnot.bi.up.ac.za/, as well as through a GBrowse annotation file that can be uploaded to the MaizeGDB genome browser as a custom track.The database was used to re-annotate lists of differentially expressed genes reported in case studies of published work using the Agilent-016047 maize microarray. Up to 85% of reporters in each list could be annotated with confidence by a single gene model, however up to 10% of reporters had ambiguous annotations. Overall, more than 57% of reporters gave a measurable signal in tissues as diverse as anthers and leaves.
The Maize Microarray Annotation Database will assist users of the Agilent-016047 maize microarray in (i) refining gene lists for global expression analysis, and (ii) confirming the annotation of candidate genes before functional studies.
微阵列技术在过去的十五年中已经成熟,成为一种具有成本效益的解决方案,具有既定的数据分析协议,可用于全球基因表达谱分析。Agilent-016047 玉米 44 K 微阵列是根据 EST 序列定制的,但只有具有 EST 访问号的报告序列是公开可用的。目前还缺乏以下信息:(a)报告基因模型匹配,(b)每个基因模型的报告数量,(c)潜在的交叉杂交,(d)报告的正反义方向,(e)报告在 B73 基因组序列上的位置(用于 eQTL 研究),以及(f)报告基因的功能注释。为了解决这个问题,我们开发了一种对 Agilent-016047 玉米微阵列进行注释的策略,并构建了一个可公开访问的注释数据库。
Agilent-016047 玉米微阵列上 42034 个报告的基因组注释是基于 60 -mer 报告序列及其相应的 EST 与玉米 B73 RefGen v2“工作基因集”(WGS)预测转录本和基因组序列的 BLASTN 结果。EST、WGS 转录本和 gDNA BLASTN 结果之间的一致性用于将报告分为六个基因组注释组。这些注释组是:(i)“基于有义基因模型的注释”(23668 个报告),(ii)“基于反义基因模型的注释”(4330 个),(iii)“基于 gDNA 的注释,没有 WGS 转录本命中”(1549 个),(iv)“基于 EST 的注释,在这种情况下,设计报告的 EST 而不是报告本身有 WGS 转录本命中”(3390 个),(v)“模糊注释”(2608 个)和 (vi)“不确定注释”(6489 个)。通过对相应的 WGS 转录本与 GenBank 的 BLASTX 和 Blast2GO 分析,获得报告的功能注释。注释可在玉米微阵列注释数据库 http://MaizeArrayAnnot.bi.up.ac.za/ 中获得,也可通过可上传到 MaizeGDB 基因组浏览器作为自定义轨道的 GBrowse 注释文件获得。该数据库用于重新注释使用 Agilent-016047 玉米微阵列发表的案例研究中报告的差异表达基因列表。每个列表中多达 85%的报告可以通过单个基因模型进行有信心的注释,但多达 10%的报告具有模糊的注释。总体而言,超过 57%的报告在不同的组织(如花药和叶片)中产生了可测量的信号。
玉米微阵列注释数据库将帮助 Agilent-016047 玉米微阵列的用户(i)改进用于全局表达分析的基因列表,以及(ii)在进行功能研究之前确认候选基因的注释。