Gorodkin Jan, Cirera Susanna, Hedegaard Jakob, Gilchrist Michael J, Panitz Frank, Jørgensen Claus, Scheibye-Knudsen Karsten, Arvin Troels, Lumholdt Steen, Sawera Milena, Green Trine, Nielsen Bente J, Havgaard Jakob H, Rosenkilde Carina, Wang Jun, Li Heng, Li Ruiqiang, Liu Bin, Hu Songnian, Dong Wei, Li Wei, Yu Jun, Wang Jian, Staefeldt Hans-Henrik, Wernersson Rasmus, Madsen Lone B, Thomsen Bo, Hornshøj Henrik, Bujie Zhan, Wang Xuegang, Wang Xuefei, Bolund Lars, Brunak Søren, Yang Huanming, Bendixen Christian, Fredholm Merete
Division of Genetics and Bioinformatics, IBHV, Grønnegärdsvej 3, The Royal Veterinary and Agricultural University, Frederiksberg C, Denmark.
Genome Biol. 2007;8(4):R45. doi: 10.1186/gb-2007-8-4-r45.
Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages.
Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories.
This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies.
基因表达结构的知识对于哺乳动物转录组学研究至关重要。我们分析了超过一百万个猪表达序列标签(EST)的集合,其中三分之二是在中国 - 丹麦猪基因组计划中产生的,三分之一来自公共数据库。中国 - 丹麦的EST是从一个标准化和97个非标准化的cDNA文库中产生的,这些文库代表35种不同组织和三个发育阶段。
使用Distiller软件包,EST被组装成大约48,000个重叠群和73,000个单拷贝序列,其中约25%与UniProt有高度置信度匹配。大约鉴定出6,000个新的猪基因簇。基于非标准化文库的表达分析得出以下结果。簇大小的分布是标度不变的。脑和睾丸是具有最多不同表达基因的组织之一,而功能更专门化的组织,如发育中的肝脏,表达的基因较少。至少有65个高置信度管家基因候选者和876个cDNA文库特异性基因候选者。我们鉴定了不同组织之间,特别是脑/脊髓之间基因表达的差异,并发现了在成对文库中共享表达的基因之间的相关性模式。最后,根据基因本体论类别,专门组织之间在表达上存在显著一致性。
这个EST集合是迄今为止猪中最大的,它是注释、比较基因组学、猪基因组序列组装以及进一步猪转录研究的重要资源。