Max Delbrück Center for Molecular Medicine, Robert-Rössle-Str. 10, 13125 Berlin, Germany.
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W455-61. doi: 10.1093/nar/gkr246. Epub 2011 May 23.
Biomedical literature is traditionally used as a way to inform scientists of the relevance of genes in relation to a research topic. However many genes, especially from poorly studied organisms, are not discussed in the literature. Moreover, a manual and comprehensive summarization of the literature attached to the genes of an organism is in general impossible due to the high number of genes and abstracts involved. We introduce the novel Génie algorithm that overcomes these problems by evaluating the literature attached to all genes in a genome and to their orthologs according to a selected topic. Génie showed high precision (up to 100%) and the best performance in comparison to other algorithms in most of the benchmarks, especially when high sensitivity was required. Moreover, the prioritization of zebrafish genes involved in heart development, using human and mouse orthologs, showed high enrichment in differentially expressed genes from microarray experiments. The Génie web server supports hundreds of species, millions of genes and offers novel functionalities. Common run times below a minute, even when analyzing the human genome with hundreds of thousands of literature records, allows the use of Génie in routine lab work.
生物医学文献传统上被用作向科学家通报与研究课题相关的基因相关性的一种方式。然而,许多基因,特别是来自研究不充分的生物体的基因,在文献中没有被讨论。此外,由于涉及的基因和摘要数量众多,对生物体的基因相关文献进行手动和全面总结通常是不可能的。我们引入了新颖的 Génie 算法,该算法通过根据选定的主题评估所有基因及其直系同源物的文献来克服这些问题。Génie 在大多数基准测试中表现出高精度(高达 100%)和最佳性能,尤其是在需要高灵敏度时。此外,使用人类和小鼠的直系同源物对参与心脏发育的斑马鱼基因进行优先级排序,显示出在微阵列实验中差异表达基因的高富集度。Génie 网络服务器支持数百种物种、数百万个基因,并提供新的功能。即使在分析包含数十万个文献记录的人类基因组时,常见的运行时间也在一分钟以下,这使得 Génie 可以在常规实验室工作中使用。