Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA.
Biochem J. 2009 Dec 14;425(1):1-11. doi: 10.1042/BJ20091328.
Like other forms of engineering, metabolic engineering requires knowledge of the components (the 'parts list') of the target system. Lack of such knowledge impairs both rational engineering design and diagnosis of the reasons for failures; it also poses problems for the related field of metabolic reconstruction, which uses a cell's parts list to recreate its metabolic activities in silico. Despite spectacular progress in genome sequencing, the parts lists for most organisms that we seek to manipulate remain highly incomplete, due to the dual problem of 'unknown' proteins and 'orphan' enzymes. The former are all the proteins deduced from genome sequence that have no known function, and the latter are all the enzymes described in the literature (and often catalogued in the EC database) for which no corresponding gene has been reported. Unknown proteins constitute up to about half of the proteins in prokaryotic genomes, and much more than this in higher plants and animals. Orphan enzymes make up more than a third of the EC database. Attacking the 'missing parts list' problem is accordingly one of the great challenges for post-genomic biology, and a tremendous opportunity to discover new facets of life's machinery. Success will require a co-ordinated community-wide attack, sustained over years. In this attack, comparative genomics is probably the single most effective strategy, for it can reliably predict functions for unknown proteins and genes for orphan enzymes. Furthermore, it is cost-efficient and increasingly straightforward to deploy owing to a proliferation of databases and associated tools.
与其他形式的工程一样,代谢工程需要了解目标系统的组成部分(“零件清单”)。缺乏这种知识既会损害合理的工程设计,也会妨碍对故障原因的诊断;它还会给代谢重建这一相关领域带来问题,因为后者使用细胞的零件清单在计算机上重新创建其代谢活动。尽管在基因组测序方面取得了惊人的进展,但由于“未知”蛋白和“孤儿”酶这双重问题,我们试图操纵的大多数生物体的零件清单仍然极不完整。前者是指所有从基因组序列推断出但没有已知功能的蛋白,后者是指文献中描述的(并且通常在 EC 数据库中进行了分类)所有没有相应基因报道的酶。未知蛋白构成了原核基因组中蛋白的约一半,而在高等植物和动物中则远远超过这个比例。孤儿酶占 EC 数据库的三分之一以上。因此,解决“缺失零件清单”问题是后基因组生物学的重大挑战之一,也是发现生命机器新面貌的绝佳机会。成功需要多年的协调一致的社区范围内的共同努力。在这场攻击中,比较基因组学可能是最有效的单一策略,因为它可以可靠地预测未知蛋白的功能和孤儿酶的基因。此外,由于数据库和相关工具的激增,它具有成本效益,并且越来越简单直接。