Tian Ai-Guo, Wang Jun, Cui Peng, Han Yu-Jun, Xu Hao, Cong Li-Juan, Huang Xian-Gang, Wang Xiao-Ling, Jiao Yong-Zhi, Wang Bang-Jun, Wang Yong-Jun, Zhang Jin-Song, Chen Shou-Yi
Plant Biotechnology Laboratory, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Datun Road, 100101, Beijing, China.
Theor Appl Genet. 2004 Mar;108(5):903-13. doi: 10.1007/s00122-003-1499-2. Epub 2003 Nov 18.
We analyzed 314,254 soybean expressed sequence tags (ESTs), including 29,540 from our laboratory and 284,714 from GenBank. These ESTs were assembled into 56,147 unigenes. About 76.92% of the unigenes were homologous to genes from Arabidopsis thaliana ( Arabidopsis). The putative products of these unigenes were annotated according to their homology with the categorized proteins of Arabidopsis. Genes corresponding to cell growth and/or maintenance, enzymes and cell communication belonged to the slow-evolving class, whereas genes related to transcription regulation, cell, binding and death appeared to be fast-evolving. Soybean unigenes with no match to genes within the Arabidopsis genome were identified as soybean-specific genes. These genes were mainly involved in nodule development and the synthesis of seed storage proteins. In addition, we also identified 61 genes regulated by salicylic acid, 1,322 transcription factor genes and 326 disease resistance-like genes from soybean unigenes. SSR analysis showed that the soybean genome was more complex than the Arabidopsis and the Medicago truncatula genomes. GC content in soybean unigene sequences is similar to that in Arabidopsis and M. truncatula. Furthermore, the combined analysis of the EST database and the BAC-contig sequences revealed that the total gene number in the soybean genome is about 63,501.
我们分析了314,254条大豆表达序列标签(EST),其中包括我们实验室的29,540条以及来自GenBank的284,714条。这些EST被组装成56,147个单基因。约76.92%的单基因与拟南芥的基因同源。这些单基因的推定产物根据其与拟南芥分类蛋白的同源性进行注释。与细胞生长和/或维持、酶及细胞通讯相关的基因属于进化缓慢的类别,而与转录调控、细胞、结合及死亡相关的基因似乎进化较快。在拟南芥基因组中未找到匹配基因的大豆单基因被鉴定为大豆特异性基因。这些基因主要参与根瘤发育和种子贮藏蛋白的合成。此外,我们还从大豆单基因中鉴定出61个受水杨酸调控的基因、1322个转录因子基因和326个类抗病基因。SSR分析表明,大豆基因组比拟南芥和蒺藜苜蓿基因组更为复杂。大豆单基因序列中的GC含量与拟南芥和蒺藜苜蓿中的相似。此外,对EST数据库和BAC重叠群序列的联合分析表明,大豆基因组中的基因总数约为63,501个。