一种用于识别人类遗传疾病候选基因的新型网络数据挖掘工具。

A new web-based data mining tool for the identification of candidate genes for human genetic disorders.

作者信息

van Driel Marc A, Cuelenaere Koen, Kemmeren Patrick P C W, Leunissen Jack A M, Brunner Han G

机构信息

Centre for Molecular and Biomolecular Informatics, University of Nijmegen, The Netherlands.

出版信息

Eur J Hum Genet. 2003 Jan;11(1):57-63. doi: 10.1038/sj.ejhg.5200918.

DOI:10.1038/sj.ejhg.5200918

PMID:12529706

Abstract

To identify the gene underlying a human genetic disorder can be difficult and time-consuming. Typically, positional data delimit a chromosomal region that contains between 20 and 200 genes. The choice then lies between sequencing large numbers of genes, or setting priorities by combining positional data with available expression and phenotype data, contained in different internet databases. This process of examining positional candidates for possible functional clues may be performed in many different ways, depending on the investigator's knowledge and experience. Here, we report on a new tool called the GeneSeeker, which gathers and combines positional data and expression/phenotypic data in an automated way from nine different web-based databases. This results in a quick overview of interesting candidate genes in the region of interest. The GeneSeeker system is built in a modular fashion allowing for easy addition or removal of databases if required. Databases are searched directly through the web, which obviates the need for data warehousing. In order to evaluate the GeneSeeker tool, we analysed syndromes with known genesis. For each of 10 syndromes the GeneSeeker programme generated a shortlist that contained a significantly reduced number of candidate genes from the critical region, yet still contained the causative gene. On average, a list of 163 genes based on position alone was reduced to a more manageable list of 22 genes based on position and expression or phenotype information. We are currently expanding the tool by adding other databases. The GeneSeeker is available via the web-interface (http://www.cmbi.kun.nl/GeneSeeker/).

摘要

鉴定导致人类遗传疾病的基因可能既困难又耗时。通常，定位数据划定了一个包含20到200个基因的染色体区域。接下来的选择是对大量基因进行测序，或者通过将定位数据与不同互联网数据库中可用的表达和表型数据相结合来确定优先级。根据研究者的知识和经验，检查定位候选基因以寻找可能的功能线索的过程可以通过许多不同的方式进行。在这里，我们报告一种名为GeneSeeker的新工具，它以自动化方式从九个不同的基于网络的数据库中收集并整合定位数据以及表达/表型数据。这能快速概览感兴趣区域内的有趣候选基因。GeneSeeker系统以模块化方式构建，如有需要可轻松添加或移除数据库。直接通过网络搜索数据库，无需数据仓储。为了评估GeneSeeker工具，我们分析了已知成因的综合征。对于10种综合征中的每一种，GeneSeeker程序生成了一个候选基因短名单，该名单中关键区域的候选基因数量显著减少，但仍包含致病基因。平均而言，仅基于位置的163个基因列表根据位置以及表达或表型信息缩减为更易于管理的22个基因列表。我们目前正在通过添加其他数据库来扩展该工具。可通过网络界面（http://www.cmbi.kun.nl/GeneSeeker/）使用GeneSeeker。