Division of Biomedical Informatics, University of California San Diego, La Jolla, California, USA.
J Am Med Inform Assoc. 2014 Jan-Feb;21(1):31-6. doi: 10.1136/amiajnl-2013-001882. Epub 2013 Aug 29.
The database of genotypes and phenotypes (dbGaP) developed by the National Center for Biotechnology Information (NCBI) is a resource that contains information on various genome-wide association studies (GWAS) and is currently available via NCBI's dbGaP Entrez interface. The database is an important resource, providing GWAS data that can be used for new exploratory research or cross-study validation by authorized users. However, finding studies relevant to a particular phenotype of interest is challenging, as phenotype information is presented in a non-standardized way. To address this issue, we developed PhenDisco (phenotype discoverer), a new information retrieval system for dbGaP. PhenDisco consists of two main components: (1) text processing tools that standardize phenotype variables and study metadata, and (2) information retrieval tools that support queries from users and return ranked results. In a preliminary comparison involving 18 search scenarios, PhenDisco showed promising performance for both unranked and ranked search comparisons with dbGaP's search engine Entrez. The system can be accessed at http://pfindr.net.
基因型和表型数据库(dbGaP)是由美国国家生物技术信息中心(NCBI)开发的资源,其中包含了各种全基因组关联研究(GWAS)的信息,目前可通过 NCBI 的 dbGaP Entrez 界面访问。该数据库是一个重要的资源,提供了 GWAS 数据,授权用户可以使用这些数据进行新的探索性研究或跨研究验证。然而,由于表型信息的呈现方式是非标准化的,因此找到与特定感兴趣表型相关的研究具有挑战性。为了解决这个问题,我们开发了 PhenDisco(表型发现者),这是一个用于 dbGaP 的新信息检索系统。PhenDisco 由两个主要组件组成:(1)用于标准化表型变量和研究元数据的文本处理工具,以及(2)支持用户查询并返回排名结果的信息检索工具。在涉及 18 个搜索场景的初步比较中,PhenDisco 在无排名和排名搜索比较方面均表现出了有希望的性能,优于 dbGaP 的搜索引擎 Entrez。该系统可在 http://pfindr.net 上访问。