Foerstner Konrad U, von Mering Christian, Bork Peer
European Molecular Biology Laboratory Meyerhofstrasse 1, Heidelberg 69117, Germany.
Philos Trans R Soc Lond B Biol Sci. 2006 Mar 29;361(1467):519-23. doi: 10.1098/rstb.2005.1809.
Environmental sequencing, also dubbed metagenomics, is increasingly being used to obtain insights into organismal communities in diverse habitats, and has a variety of potential applications foreseeable in biotechnology and medicine. The first public large-scale data provide already a wealth of information hidden in vast amounts of fragmented pieces of DNA from unknown species residing in these environments. Comparative sequence analysis is essential for the interpretation of such data. However, different layers of complexity that are intrinsic to each sample require the establishment of some baselines for comparison: how to normalize for the differences in phylogenetic and functional diversity, how to avoid biases from incomplete data, and how to deal with differences in species dominance or genome sizes? Here we discuss a few of these items and delineate some simple discriminative sequence properties for four distinct habitats.
环境测序,也被称为宏基因组学,越来越多地被用于深入了解不同栖息地中的生物群落,并且在生物技术和医学领域有着各种可预见的潜在应用。首批公开的大规模数据已经提供了大量隐藏在这些环境中未知物种的海量碎片化DNA片段中的信息。比较序列分析对于解释此类数据至关重要。然而,每个样本固有的不同层次的复杂性需要建立一些比较基线:如何针对系统发育和功能多样性的差异进行标准化,如何避免不完整数据带来的偏差,以及如何处理物种优势度或基因组大小的差异?在这里,我们讨论其中的一些问题,并描述四种不同栖息地的一些简单的判别序列特性。