Butte Atul J, Kohane Isaac S
Stanford Medical Informatics, Department of Medicine, Stanford University School of Medicine, 251 Campus Drive, Room X-215, Stanford, California 94305-5479, USA.
Nat Biotechnol. 2006 Jan;24(1):55-62. doi: 10.1038/nbt1150.
Although gene and protein measurements are increasing in quantity and comprehensiveness, they do not characterize a sample's entire phenotype in an environmental or experimental context. Here we comprehensively consider associations between components of phenotype, genotype and environment to identify genes that may govern phenotype and responses to the environment. Context from the annotations of gene expression data sets in the Gene Expression Omnibus is represented using the Unified Medical Language System, a compendium of biomedical vocabularies with nearly 1-million concepts. After showing how data sets can be clustered by annotative concepts, we find a network of relations between phenotypic, disease, environmental and experimental contexts as well as genes with differential expression associated with these concepts. We identify novel genes related to concepts such as aging. Comprehensively identifying genes related to phenotype and environment is a step toward the Human Phenome Project.
尽管基因和蛋白质测量在数量和全面性上不断增加,但它们无法在环境或实验背景下完整地表征样本的整个表型。在此,我们全面考虑表型、基因型和环境各组成部分之间的关联,以识别可能控制表型及对环境反应的基因。基因表达综合数据库中基因表达数据集注释的背景信息通过统一医学语言系统来呈现,该系统是一个包含近100万个概念的生物医学词汇汇编。在展示了如何根据注释概念对数据集进行聚类后,我们发现了表型、疾病、环境和实验背景之间的关系网络,以及与这些概念相关的差异表达基因。我们识别出了与衰老等概念相关的新基因。全面识别与表型和环境相关的基因是迈向人类表型组计划的一步。