CATMA，一个用于拟南芥基因沉默和转录谱分析的全面的基因组规模资源。

CATMA, a comprehensive genome-scale resource for silencing and transcript profiling of Arabidopsis genes.

作者信息

Sclep Gert, Allemeersch Joke, Liechti Robin, De Meyer Björn, Beynon Jim, Bhalerao Rishikesh, Moreau Yves, Nietfeld Wilfried, Renou Jean-Pierre, Reymond Philippe, Kuiper Martin Tr, Hilson Pierre

机构信息

Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Ghent, Belgium.

出版信息

BMC Bioinformatics. 2007 Oct 18;8:400. doi: 10.1186/1471-2105-8-400.

DOI:10.1186/1471-2105-8-400

PMID:17945016

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2147040/

Abstract

BACKGROUND

The Complete Arabidopsis Transcript MicroArray (CATMA) initiative combines the efforts of laboratories in eight European countries 1 to deliver gene-specific sequence tags (GSTs) for the Arabidopsis research community. The CATMA initiative offers the power and flexibility to regularly update the GST collection according to evolving knowledge about the gene repertoire. These GST amplicons can easily be reamplified and shared, subsets can be picked at will to print dedicated arrays, and the GSTs can be cloned and used for other functional studies. This ongoing initiative has already produced approximately 24,000 GSTs that have been made publicly available for spotted microarray printing and RNA interference.

RESULTS

GSTs from the CATMA version 2 repertoire (CATMAv2, created in 2002) were mapped onto the gene models from two independent Arabidopsis nuclear genome annotation efforts, TIGR5 and PSB-EuGène, to consolidate a list of genes that were targeted by previously designed CATMA tags. A total of 9,027 gene models were not tagged by any amplified CATMAv2 GST, and 2,533 amplified GSTs were no longer predicted to tag an updated gene model. To validate the efficacy of GST mapping criteria and design rules, the predicted and experimentally observed hybridization characteristics associated to GST features were correlated in transcript profiling datasets obtained with the CATMAv2 microarray, confirming the reliability of this platform. To complete the CATMA repertoire, all 9,027 gene models for which no GST had yet been designed were processed with an adjusted version of the Specific Primer and Amplicon Design Software (SPADS). A total of 5,756 novel GSTs were designed and amplified by PCR from genomic DNA. Together with the pre-existing GST collection, this new addition constitutes the CATMAv3 repertoire. It comprises 30,343 unique amplified sequences that tag 24,202 and 23,009 protein-encoding nuclear gene models in the TAIR6 and EuGène genome annotations, respectively. To cover the remaining untagged genes, we identified 543 additional GSTs using less stringent design criteria and designed 990 sequence tags matching multiple members of gene families (Gene Family Tags or GFTs) to cover any remaining untagged genes. These latter 1,533 features constitute the CATMAv4 addition.

CONCLUSION

To update the CATMA GST repertoire, we designed 7,289 additional sequence tags, bringing the total number of tagged TAIR6-annotated Arabidopsis nuclear protein-coding genes to 26,173. This resource is used both for the production of spotted microarrays and the large-scale cloning of hairpin RNA silencing vectors. All information about the resulting updated CATMA repertoire is available through the CATMA database http://www.catma.org.

摘要

背景

拟南芥全转录本微阵列（CATMA）计划集合了八个欧洲国家实验室的力量，为拟南芥研究群体提供基因特异性序列标签（GST）。CATMA计划具有根据不断发展的基因库知识定期更新GST集合的能力和灵活性。这些GST扩增子可以轻松地再次扩增和共享，可以随意挑选子集来打印专用阵列，并且GST可以被克隆并用于其他功能研究。这个正在进行的计划已经产生了大约24,000个GST，这些GST已公开提供用于点阵微阵列打印和RNA干扰。

结果

来自CATMA版本2文库（CATMAv2，创建于2002年）的GST被映射到来自两个独立的拟南芥核基因组注释工作（TIGR5和PSB - EuGène）的基因模型上，以整合先前设计的CATMA标签所靶向的基因列表。共有9,027个基因模型没有被任何扩增的CATMAv2 GST标记，并且2,533个扩增的GST不再被预测标记更新的基因模型。为了验证GST映射标准和设计规则的有效性，在使用CATMAv2微阵列获得的转录谱数据集中，将与GST特征相关的预测和实验观察到的杂交特性进行关联，证实了该平台的可靠性。为了完善CATMA文库，所有尚未设计GST的9,027个基因模型都使用特定引物和扩增子设计软件（SPADS）的调整版本进行处理。通过PCR从基因组DNA中设计并扩增了总共5,756个新的GST。连同先前存在的GST集合，这一新补充构成了CATMAv3文库。它包含30,343个独特的扩增序列，分别在TAIR6和EuGène基因组注释中标记24,202个和23,009个蛋白质编码核基因模型。为了覆盖其余未标记的基因，我们使用不太严格的设计标准鉴定了543个额外的GST，并设计了990个与基因家族多个成员匹配的序列标签（基因家族标签或GFT）以覆盖任何剩余未标记的基因。这后1,533个特征构成了CATMAv4补充。

结论

为了更新CATMA GST文库，我们设计了另外7,289个序列标签，使TAIR6注释的拟南芥核蛋白编码基因的标记总数达到26,173个。该资源既用于点阵微阵列的生产，也用于发夹RNA沉默载体的大规模克隆。有关最终更新的CATMA文库的所有信息可通过CATMA数据库http://www.catma.org获得。