Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
PLoS Biol. 2009 Nov;7(11):e1000247. doi: 10.1371/journal.pbio.1000247. Epub 2009 Nov 24.
Scientists and clinicians who study genetic alterations and disease have traditionally described phenotypes in natural language. The considerable variation in these free-text descriptions has posed a hindrance to the important task of identifying candidate genes and models for human diseases and indicates the need for a computationally tractable method to mine data resources for mutant phenotypes. In this study, we tested the hypothesis that ontological annotation of disease phenotypes will facilitate the discovery of new genotype-phenotype relationships within and across species. To describe phenotypes using ontologies, we used an Entity-Quality (EQ) methodology, wherein the affected entity (E) and how it is affected (Q) are recorded using terms from a variety of ontologies. Using this EQ method, we annotated the phenotypes of 11 gene-linked human diseases described in Online Mendelian Inheritance in Man (OMIM). These human annotations were loaded into our Ontology-Based Database (OBD) along with other ontology-based phenotype descriptions of mutants from various model organism databases. Phenotypes recorded with this EQ method can be computationally compared based on the hierarchy of terms in the ontologies and the frequency of annotation. We utilized four similarity metrics to compare phenotypes and developed an ontology of homologous and analogous anatomical structures to compare phenotypes between species. Using these tools, we demonstrate that we can identify, through the similarity of the recorded phenotypes, other alleles of the same gene, other members of a signaling pathway, and orthologous genes and pathway members across species. We conclude that EQ-based annotation of phenotypes, in conjunction with a cross-species ontology, and a variety of similarity metrics can identify biologically meaningful similarities between genes by comparing phenotypes alone. This annotation and search method provides a novel and efficient means to identify gene candidates and animal models of human disease, which may shorten the lengthy path to identification and understanding of the genetic basis of human disease.
研究遗传改变和疾病的科学家和临床医生传统上使用自然语言描述表型。这些自由文本描述的巨大差异给确定候选基因和人类疾病模型的重要任务带来了阻碍,这表明需要一种计算上易于处理的方法来挖掘数据资源以寻找突变表型。在这项研究中,我们检验了这样一个假设,即疾病表型的本体论注释将有助于在物种内和跨物种发现新的基因型-表型关系。为了使用本体论来描述表型,我们使用了一种实体-质量 (EQ) 方法,其中使用各种本体论中的术语记录受影响的实体 (E) 和受影响的方式 (Q)。使用这种 EQ 方法,我们对 11 种基因相关人类疾病的表型进行了注释,这些疾病在在线 Mendelian Inheritance in Man (OMIM) 中进行了描述。这些人类注释与我们的基于本体论的数据库 (OBD) 中的其他基于本体论的突变体表型描述一起加载,这些突变体描述来自各种模式生物数据库。使用这种 EQ 方法记录的表型可以基于本体论中的术语层次结构和注释的频率进行计算比较。我们利用了四种相似性度量来比较表型,并开发了一个同源和类似解剖结构的本体论来比较物种之间的表型。使用这些工具,我们证明我们可以通过记录的表型的相似性来识别同一基因的其他等位基因、信号通路的其他成员以及跨物种的同源基因和通路成员。我们得出结论,基于 EQ 的表型注释,结合跨物种本体论和多种相似性度量标准,可以仅通过比较表型来识别基因之间具有生物学意义的相似性。这种注释和搜索方法为通过比较表型来识别候选基因和人类疾病的动物模型提供了一种新颖而有效的方法,这可能会缩短识别和理解人类疾病遗传基础的漫长道路。