Burgoon Lyle D, Zacharewski Timothy R
Department of Biochemistry & Molecular Biology, Michigan State University, East Lansing, Michigan, USA.
Methods Mol Biol. 2008;460:145-57. doi: 10.1007/978-1-60327-048-9_7.
"Omics" experiments amass large amounts of data requiring integration of several data sources for data interpretation. For instance, microarray, metabolomic, and proteomic experiments may at most yield a list of active genes, metabolites, or proteins, respectively. More generally, the experiments yield active features that represent subsequences of the gene, a chemical shift within a complex mixture, or peptides, respectively. Thus, in the best-case scenario, the investigator is left to identify the functional significance, but more likely the investigator must first identify the larger context of the feature (e.g., which gene, metabolite, or protein is being represented by the feature). To completely annotate function, several different databases are required, including sequence, genome, gene function, protein, and protein interaction databases. Because of the limited coverage of some microarrays or experiments, biological data repositories may be consulted, in the case of microarrays, to complement results. Many of the data sources and databases available for gene function characterization, including tools from the National Center for Biotechnology Information, Gene Ontology, and UniProt, are discussed.
“组学”实验积累了大量数据,需要整合多个数据源来解释数据。例如,微阵列、代谢组学和蛋白质组学实验最多可能分别产生一份活跃基因、代谢物或蛋白质的列表。更一般地说,这些实验分别产生代表基因子序列、复杂混合物中的化学位移或肽段的活跃特征。因此,在最佳情况下,研究人员需要自行确定功能意义,但更有可能的是,研究人员必须首先确定该特征的更大背景(例如,该特征代表哪个基因、代谢物或蛋白质)。为了完全注释功能,需要几个不同的数据库,包括序列、基因组、基因功能、蛋白质和蛋白质相互作用数据库。由于某些微阵列或实验的覆盖范围有限,对于微阵列而言,可能需要查阅生物数据储存库以补充结果。本文讨论了许多可用于基因功能表征的数据源和数据库,包括来自美国国立生物技术信息中心、基因本体论和通用蛋白质数据库的工具。