Satoh Kouji, Doi Koji, Nagata Toshifumi, Kishimoto Naoki, Suzuki Kohji, Otomo Yasuhiro, Kawai Jun, Nakamura Mari, Hirozane-Kishikawa Tomoko, Kanagawa Saeko, Arakawa Takahiro, Takahashi-Iida Juri, Murata Mitsuyoshi, Ninomiya Noriko, Sasaki Daisuke, Fukuda Shiro, Tagami Michihira, Yamagata Harumi, Kurita Kanako, Kamiya Kozue, Yamamoto Mayu, Kikuta Ari, Bito Takahito, Fujitsuka Nahoko, Ito Kazue, Kanamori Hiroyuki, Choi Il-Ryong, Nagamura Yoshiaki, Matsumoto Takashi, Murakami Kazuo, Matsubara Ken-ichi, Carninci Piero, Hayashizaki Yoshihide, Kikuchi Shoshi
Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki, Japan.
PLoS One. 2007 Nov 28;2(11):e1235. doi: 10.1371/journal.pone.0001235.
Rice (Oryza sativa L.) is a model organism for the functional genomics of monocotyledonous plants since the genome size is considerably smaller than those of other monocotyledonous plants. Although highly accurate genome sequences of indica and japonica rice are available, additional resources such as full-length complementary DNA (FL-cDNA) sequences are also indispensable for comprehensive analyses of gene structure and function. We cross-referenced 28.5K individual loci in the rice genome defined by mapping of 578K FL-cDNA clones with the 56K loci predicted in the TIGR genome assembly. Based on the annotation status and the presence of corresponding cDNA clones, genes were classified into 23K annotated expressed (AE) genes, 33K annotated non-expressed (ANE) genes, and 5.5K non-annotated expressed (NAE) genes. We developed a 60mer oligo-array for analysis of gene expression from each locus. Analysis of gene structures and expression levels revealed that the general features of gene structure and expression of NAE and ANE genes were considerably different from those of AE genes. The results also suggested that the cloning efficiency of rice FL-cDNA is associated with the transcription activity of the corresponding genetic locus, although other factors may also have an effect. Comparison of the coverage of FL-cDNA among gene families suggested that FL-cDNA from genes encoding rice- or eukaryote-specific domains, and those involved in regulatory functions were difficult to produce in bacterial cells. Collectively, these results indicate that rice genes can be divided into distinct groups based on transcription activity and gene structure, and that the coverage bias of FL-cDNA clones exists due to the incompatibility of certain eukaryotic genes in bacteria.
水稻(Oryza sativa L.)是单子叶植物功能基因组学的模式生物,因为其基因组大小比其他单子叶植物的基因组小得多。尽管籼稻和粳稻有高度准确的基因组序列,但全长互补DNA(FL-cDNA)序列等其他资源对于基因结构和功能的全面分析也是不可或缺的。我们将通过对57.8万个FL-cDNA克隆进行定位所确定的水稻基因组中的2.85万个独立位点,与TIGR基因组组装中预测的5.6万个位点进行了交叉参照。根据注释状态和相应cDNA克隆的存在情况,将基因分为2.3万个注释表达(AE)基因、3.3万个注释非表达(ANE)基因和5500个未注释表达(NAE)基因。我们开发了一种60聚体寡核苷酸芯片,用于分析每个位点的基因表达。对基因结构和表达水平的分析表明,NAE和ANE基因的基因结构和表达的一般特征与AE基因有很大不同。结果还表明,水稻FL-cDNA的克隆效率与相应基因位点的转录活性相关,尽管其他因素可能也有影响。对基因家族中FL-cDNA覆盖范围的比较表明,来自编码水稻或真核生物特异性结构域以及参与调控功能的基因的FL-cDNA在细菌细胞中难以产生。总体而言,这些结果表明,水稻基因可以根据转录活性和基因结构分为不同的组,并且由于某些真核基因在细菌中的不兼容性,存在FL-cDNA克隆的覆盖偏差。