Program on Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92037, USA.
Curr Opin Struct Biol. 2011 Jun;21(3):398-403. doi: 10.1016/j.sbi.2011.03.010. Epub 2011 Apr 14.
Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones.
宏基因组测序项目极大地增加了我们对蛋白质宇宙的认识,并提供了目前已知蛋白质序列的一半以上;它们还将更广泛的系统发育多样性引入到蛋白质数据库中。对宏基因组数据集的全面分析才刚刚开始,但已经发现了数千个新的蛋白质家族,这些家族可能代表了特定环境特有的新功能。与此同时,对这些新家族的更深入分析,包括一些代表的实验结构测定,表明它们大多数代表已经确定的蛋白质家族的远缘同源物,因此,新环境中存在的大多数蛋白质多样性是由于已知蛋白质家族的功能分化,而不是新家族的出现。