高通量测序时代的基因型与表型关联研究

Connecting genotype to phenotype in the era of high-throughput sequencing.

作者信息

Henry Christopher S, Overbeek Ross, Xia Fangfang, Best Aaron A, Glass Elizabeth, Gilbert Jack, Larsen Peter, Edwards Rob, Disz Terry, Meyer Folker, Vonstein Veronika, Dejongh Matthew, Bartels Daniela, Desai Narayan, D'Souza Mark, Devoid Scott, Keegan Kevin P, Olson Robert, Wilke Andreas, Wilkening Jared, Stevens Rick L

机构信息

Argonne National Laboratory, Argonne, IL 60439, USA.

出版信息

Biochim Biophys Acta. 2011 Oct;1810(10):967-77. doi: 10.1016/j.bbagen.2011.03.010. Epub 2011 Mar 21.

DOI:10.1016/j.bbagen.2011.03.010

PMID:21421023

Abstract

BACKGROUND

The development of next generation sequencing technology is rapidly changing the face of the genome annotation and analysis field. One of the primary uses for genome sequence data is to improve our understanding and prediction of phenotypes for microbes and microbial communities, but the technologies for predicting phenotypes must keep pace with the new sequences emerging.

SCOPE OF REVIEW

This review presents an integrated view of the methods and technologies used in the inference of phenotypes for microbes and microbial communities based on genomic and metagenomic data. Given the breadth of this topic, we place special focus on the resources available within the SEED Project. We discuss the two steps involved in connecting genotype to phenotype: sequence annotation, and phenotype inference, and we highlight the challenges in each of these steps when dealing with both single genome and metagenome data.

MAJOR CONCLUSIONS

This integrated view of the genotype-to-phenotype problem highlights the importance of a controlled ontology in the annotation of genomic data, as this benefits subsequent phenotype inference and metagenome annotation. We also note the importance of expanding the set of reference genomes to improve the annotation of all sequence data, and we highlight metagenome assembly as a potential new source for complete genomes. Finally, we find that phenotype inference, particularly from metabolic models, generates predictions that can be validated and reconciled to improve annotations.

GENERAL SIGNIFICANCE

This review presents the first look at the challenges and opportunities associated with the inference of phenotype from genotype during the next generation sequencing revolution. This article is part of a Special Issue entitled: Systems Biology of Microorganisms.

摘要

背景

新一代测序技术的发展正在迅速改变基因组注释和分析领域的面貌。基因组序列数据的主要用途之一是增进我们对微生物和微生物群落表型的理解与预测，但预测表型的技术必须跟上新出现序列的步伐。

综述范围

本综述呈现了基于基因组和宏基因组数据推断微生物和微生物群落表型所使用的方法和技术的综合观点。鉴于该主题的广度，我们特别关注种子项目中可用的资源。我们讨论了将基因型与表型联系起来的两个步骤：序列注释和表型推断，并强调在处理单基因组和宏基因组数据时每个步骤所面临的挑战。

主要结论

这种对基因型到表型问题的综合观点突出了受控本体在基因组数据注释中的重要性，因为这有利于后续的表型推断和宏基因组注释。我们还指出了扩展参考基因组集以改进所有序列数据注释的重要性，并强调宏基因组组装是完整基因组的潜在新来源。最后，我们发现表型推断，特别是来自代谢模型的推断，会产生可验证和协调以改进注释的预测。