Mattes William B, Pettit Syril D, Sansone Susanna-Assunta, Bushel Pierre R, Waters Michael D
Pfizer Inc, Groton, Connecticut, USA.
Environ Health Perspect. 2004 Mar;112(4):495-505. doi: 10.1289/ehp.6697.
The marriage of toxicology and genomics has created not only opportunities but also novel informatics challenges. As with the larger field of gene expression analysis, toxicogenomics faces the problems of probe annotation and data comparison across different array platforms. Toxicogenomics studies are generally built on standard toxicology studies generating biological end point data, and as such, one goal of toxicogenomics is to detect relationships between changes in gene expression and in those biological parameters. These challenges are best addressed through data collection into a well-designed toxicogenomics database. A successful publicly accessible toxicogenomics database will serve as a repository for data sharing and as a resource for analysis, data mining, and discussion. It will offer a vehicle for harmonizing nomenclature and analytical approaches and serve as a reference for regulatory organizations to evaluate toxicogenomics data submitted as part of registrations. Such a database would capture the experimental context of in vivo studies with great fidelity such that the dynamics of the dose response could be probed statistically with confidence. This review presents the collaborative efforts between the European Molecular Biology Laboratory-European Bioinformatics Institute ArrayExpress, the International Life Sciences Institute Health and Environmental Science Institute, and the National Institute of Environmental Health Sciences National Center for Toxigenomics Chemical Effects in Biological Systems knowledge base. The goal of this collaboration is to establish public infrastructure on an international scale and examine other developments aimed at establishing toxicogenomics databases. In this review we discuss several issues common to such databases: the requirement for identifying minimal descriptors to represent the experiment, the demand for standardizing data storage and exchange formats, the challenge of creating standardized nomenclature and ontologies to describe biological data, the technical problems involved in data upload, the necessity of defining parameters that assess and record data quality, and the development of standardized analytical approaches.
毒理学与基因组学的结合不仅带来了机遇,也带来了新的信息学挑战。与基因表达分析这个更广泛的领域一样,毒理基因组学面临着探针注释以及跨不同阵列平台进行数据比较的问题。毒理基因组学研究通常建立在生成生物学终点数据的标准毒理学研究基础之上,因此,毒理基因组学的一个目标是检测基因表达变化与那些生物学参数变化之间的关系。通过将数据收集到精心设计的毒理基因组学数据库中,能够最好地应对这些挑战。一个成功的可公开访问的毒理基因组学数据库将成为数据共享的储存库以及分析、数据挖掘和讨论的资源。它将提供一个统一命名法和分析方法的工具,并作为监管机构评估作为注册一部分提交的毒理基因组学数据的参考。这样一个数据库将高度精确地捕捉体内研究的实验背景,从而能够自信地对剂量反应的动态进行统计探究。本综述介绍了欧洲分子生物学实验室 - 欧洲生物信息学研究所的ArrayExpress、国际生命科学研究所健康与环境科学研究所,以及美国国立环境卫生科学研究所毒理基因组学国家中心生物系统化学效应知识库之间的合作成果。这项合作的目标是在国际范围内建立公共基础设施,并审视旨在建立毒理基因组学数据库的其他进展。在本综述中,我们讨论了此类数据库共有的几个问题:识别代表实验的最小描述符的要求、标准化数据存储和交换格式的需求、创建用于描述生物数据的标准化命名法和本体的挑战、数据上传所涉及的技术问题、定义评估和记录数据质量的参数的必要性,以及标准化分析方法的开发。