Park Yoonseong, Aikins Jamie, Wang L J, Beeman Richard W, Oppert Brenda, Lord Jeffrey C, Brown Susan J, Lorenzen Marcé D, Richards Stephen, Weinstock George M, Gibbs Richard A
Department of Entomology, Kansas State University, Manhattan, KS 66506-4004, USA.
Insect Biochem Mol Biol. 2008 Apr;38(4):380-6. doi: 10.1016/j.ibmb.2007.09.008. Epub 2007 Sep 29.
The whole genome sequence of Tribolium castaneum, a worldwide coleopteran pest of stored products, has recently been determined. In order to facilitate accurate annotation and detailed functional analysis of this genome, we have compiled and analyzed all available expressed sequence tag (EST) data. The raw data consist of 61,228 ESTs, including 10,704 obtained from NCBI and an additional 50,524 derived from 32,544 clones generated in our laboratories. These sequences were amassed from cDNA libraries representing six different tissues or stages, namely: whole embryos, whole larvae, larval hindguts and Malpighian tubules, larval fat bodies and carcasses, adult ovaries, and adult heads. Assembly of the 61,228 sequences collapsed into 12,269 clusters (groups of overlapping ESTs representing single genes), of which 10,134 mapped onto 6,463 (39%) of the 16,422 GLEAN gene models (i.e. official Tribolium gene list). Approximately 1,600 clusters (13% of the total) lack corresponding GLEAN models, despite high matches to the genome, suggesting that a considerable number of transcribed sequences were missed by the gene prediction programs or were removed by GLEAN. We conservatively estimate that the current EST set represents more than 7,500 transcription units.
赤拟谷盗是一种世界性的仓储产品鞘翅目害虫,其全基因组序列最近已被测定。为了便于对该基因组进行准确注释和详细的功能分析,我们收集并分析了所有可用的表达序列标签(EST)数据。原始数据包括61228个EST,其中10704个来自NCBI,另外50524个来自我们实验室构建的32544个克隆。这些序列来自代表六个不同组织或阶段的cDNA文库,即:全胚胎、全幼虫、幼虫后肠和马氏管、幼虫脂肪体和胴体、成虫卵巢以及成虫头部。61228个序列组装后形成了12269个簇(代表单个基因的重叠EST组),其中10134个定位到16422个GLEAN基因模型(即赤拟谷盗官方基因列表)中的6463个(39%)。尽管与基因组高度匹配,但仍有约1600个簇(占总数的13%)缺乏相应的GLEAN模型,这表明基因预测程序遗漏了相当数量的转录序列,或者被GLEAN去除了。我们保守估计,当前的EST集代表了超过7500个转录单位。