Centre for Molecular and Biomolecular Informatics, Nijmegen Centre for Molecular Life Sciences, Radboud University Nijmegen Medical Centre, Geert Grooteplein 26-28, 6525 GA Nijmegen, The Netherlands.
Am J Hum Genet. 2009 Dec;85(6):801-8. doi: 10.1016/j.ajhg.2009.10.026.
Disease networks are increasingly explored as a complement to networks centered around interactions between genes and proteins. The quality of disease networks is heavily dependent on the amount and quality of phenotype information in phenotype databases of human genetic diseases. We explored which aspects of phenotype database architecture and content best reflect the underlying biology of disease. We used the OMIM-based HPO, Orphanet, and POSSUM phenotype databases for this purpose and devised a biological coherence score based on the sharing of gene ontology annotation to investigate the degree to which phenotype similarity in these databases reflects related pathobiology. Our analyses support the notion that a fine-grained phenotype ontology enhances the accuracy of phenome representation. In addition, we find that the OMIM database that is most used by the human genetics community is heavily underannotated. We show that this problem can easily be overcome by simply adding data available in the POSSUM database to improve OMIM phenotype representations in the HPO. Also, we find that the use of feature frequency estimates--currently implemented only in the Orphanet database--significantly improves the quality of the phenome representation. Our data suggest that there is much to be gained by improving human phenome databases and that some of the measures needed to achieve this are relatively easy to implement. More generally, we propose that curation and more systematic annotation of human phenome databases can greatly improve the power of the phenotype for genetic disease analysis.
疾病网络越来越多地被探索作为围绕基因和蛋白质相互作用的网络的补充。疾病网络的质量在很大程度上取决于人类遗传疾病表型数据库中表型信息的数量和质量。我们探讨了表型数据库架构和内容的哪些方面最能反映疾病的潜在生物学。为此,我们使用了基于 OMIM 的 HPO、Orphanet 和 POSSUM 表型数据库,并基于基因本体论注释的共享设计了一个生物学一致性评分,以调查这些数据库中表型相似性在多大程度上反映了相关的病理生物学。我们的分析支持这样一种观点,即精细的表型本体可以提高表型的准确性。此外,我们发现最受人类遗传学社区使用的 OMIM 数据库严重注释不足。我们表明,通过简单地添加 POSSUM 数据库中可用的数据来改进 HPO 中的 OMIM 表型表示,可以很容易地解决这个问题。此外,我们发现使用特征频率估计值(目前仅在 Orphanet 数据库中实现)可显著提高表型表示的质量。我们的数据表明,通过改进人类表型数据库可以获得很多收益,而实现这一目标所需的一些措施相对容易实施。更一般地,我们提出,对人类表型数据库的管理和更系统的注释可以极大地提高表型在遗传疾病分析中的作用。