Nara Institute of Science and Technology, Ikoma, Japan.
Gene. 2012 Jul 15;503(1):56-64. doi: 10.1016/j.gene.2012.04.043. Epub 2012 Apr 24.
Operon-like arrangements of genes occur in eukaryotes ranging from yeasts and filamentous fungi to nematodes, plants, and mammals. In plants, several examples of operon-like gene clusters involved in metabolic pathways have recently been characterized, e.g. the cyclic hydroxamic acid pathways in maize, the avenacin biosynthesis gene clusters in oat, the thalianol pathway in Arabidopsis thaliana, and the diterpenoid momilactone cluster in rice. Such operon-like gene clusters are defined by their co-regulation or neighboring positions within immediate vicinity of chromosomal regions. A comprehensive analysis of the expression of neighboring genes therefore accounts a crucial step to reveal the complete set of operon-like gene clusters within a genome. Genome-wide prediction of operon-like gene clusters should contribute to functional annotation efforts and provide novel insight into evolutionary aspects acquiring certain biological functions as well. We predicted co-expressed gene clusters by comparing the Pearson correlation coefficient of neighboring genes and randomly selected gene pairs, based on a statistical method that takes false discovery rate (FDR) into consideration for 1469 microarray gene expression datasets of A. thaliana. We estimated that A. thaliana contains 100 operon-like gene clusters in total. We predicted 34 statistically significant gene clusters consisting of 3 to 22 genes each, based on a stringent FDR threshold of 0.1. Functional relationships among genes in individual clusters were estimated by sequence similarity and functional annotation of genes. Duplicated gene pairs (determined based on BLAST with a cutoff of E<10(-5)) are included in 27 clusters. Five clusters are associated with metabolism, containing P450 genes restricted to the Brassica family and predicted to be involved in secondary metabolism. Operon-like clusters tend to include genes encoding bio-machinery associated with ribosomes, the ubiquitin/proteasome system, secondary metabolic pathways, lipid and fatty-acid metabolism, and the lipid transfer system.
真核生物中存在操纵子样基因排列,从酵母和丝状真菌到线虫、植物和哺乳动物都有。在植物中,最近已经有几个涉及代谢途径的操纵子样基因簇的例子被描述,例如玉米中的环状羟肟酸途径、燕麦中的 Avenacin 生物合成基因簇、拟南芥中的 Thalianol 途径和水稻中的二萜类 Momilactone 簇。这种操纵子样基因簇的定义是它们在染色体区域的邻近位置的共调控或相邻位置。因此,对邻近基因的表达进行全面分析是揭示基因组中完整的操纵子样基因簇的关键步骤。全基因组预测操纵子样基因簇应该有助于功能注释工作,并为获得某些生物功能的进化方面提供新的见解。我们通过比较邻近基因和随机选择的基因对的 Pearson 相关系数,基于一种考虑错误发现率 (FDR) 的统计方法,对拟南芥的 1469 个微阵列基因表达数据集进行了共表达基因簇的预测。我们估计拟南芥总共包含 100 个操纵子样基因簇。我们基于 FDR 阈值为 0.1 的严格标准,预测了 34 个由 3 到 22 个基因组成的统计学上显著的基因簇。通过对单个簇中基因的序列相似性和功能注释来估计基因之间的功能关系。基于 BLAST(E<10(-5))的截断值确定的重复基因对包含在 27 个簇中。5 个簇与代谢有关,包含仅限于芸薹属家族的 P450 基因,并预测参与次生代谢。操纵子样簇倾向于包含编码与核糖体、泛素/蛋白酶体系统、次生代谢途径、脂质和脂肪酸代谢以及脂质转移系统相关的生物机制的基因。