Savolainen P, Fitzsimmons C, Arvestad L, Andersson L, Lundeberg J
Department of Biotechnology, Royal Institute of Technology, Stockholm, Sweden.
Cytogenet Genome Res. 2005;111(1):79-87. doi: 10.1159/000085674.
We report the generation, assembly and annotation of expressed sequence tags (ESTs) from four chicken cDNA libraries, constructed from brain and testis tissue dissected from red junglefowl and White Leghorn. 21,285 5'-end ESTs were generated and assembled into 2,813 contigs and 9,737 singletons, giving 12,549 tentative unique transcripts. The transcripts were annotated using BLAST by matching to known chicken genes or to putative homologues in other species using the major gene/protein databases. The results for these similarity searches are available on www.sbc.su.se/~arve/chicken. 4,129 (32.9%) of the transcripts remained without a significant match to gene/protein databases, a proportion of unmatched transcripts similar to earlier non-mammalian EST studies. To estimate how many of these transcripts may represent novel genes, they were studied for the presence of coding sequence. It was shown that most of the unique chicken transcripts do not contain coding parts of genes, but it was estimated that at least 400 of the transcripts contain coding sequence, indicating that 3.2% of avian genes belong to previously unknown gene families. Further BLAST search against dbEST left 1,649 (13.1%) of the transcripts unmatched to any library. The number of completely unmatched transcripts containing coding sequence was estimated at 180, giving a measure of the number of putative novel chicken genes identified in this study. 84.3% of the identified transcripts were found only in testis tissue, which has been poorly studied in earlier chicken EST studies. Large differences in expression levels were found between the brain and testis libraries for a large number of transcripts, and among the 525 most frequently represented transcripts, there were at least 20 transcripts with significant difference in expression levels between red junglefowl and White Leghorn.
我们报道了从四个鸡cDNA文库中生成、组装和注释表达序列标签(EST)的情况。这些文库是用取自红原鸡和白来航鸡的脑和睾丸组织构建的。共生成了21,285个5'端EST,并组装成2,813个重叠群和9,737个单拷贝序列,得到12,549个暂定的独特转录本。通过使用BLAST,将这些转录本与已知的鸡基因或使用主要基因/蛋白质数据库在其他物种中的假定同源物进行匹配,从而进行注释。这些相似性搜索的结果可在www.sbc.su.se/~arve/chicken上获取。4,129个(32.9%)转录本与基因/蛋白质数据库没有显著匹配,这一未匹配转录本的比例与早期非哺乳动物EST研究相似。为了估计这些转录本中有多少可能代表新基因,对它们进行了编码序列存在情况的研究。结果表明,大多数独特的鸡转录本不包含基因的编码部分,但估计至少有400个转录本包含编码序列,这表明3.2%的鸟类基因属于以前未知的基因家族。进一步对dbEST进行BLAST搜索后,1,649个(13.1%)转录本与任何文库都不匹配。估计含有编码序列的完全不匹配转录本数量为180个,这给出了本研究中鉴定出的假定新鸡基因数量的一个衡量标准。在早期鸡EST研究中对睾丸组织研究较少,而在本研究中发现84.3%的已鉴定转录本仅在睾丸组织中出现。在脑和睾丸文库之间,大量转录本的表达水平存在很大差异,在525个最常出现的转录本中,红原鸡和白来航鸡之间至少有20个转录本的表达水平存在显著差异。