Biotechnology Center (BIOTEC), Technische Universität Dresden, 01062, Dresden, Germany.
BMC Bioinformatics. 2009 Oct 1;10 Suppl 10(Suppl 10):S7. doi: 10.1186/1471-2105-10-S10-S7.
Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason.
Here, we introduce a third approach, GoWeb, which combines classical keyword-based Web search with text-mining and ontologies to navigate large results sets and facilitate question answering. We evaluate GoWeb on three benchmarks of questions on genes and functions, on symptoms and diseases, and on proteins and diseases. The first benchmark is based on the BioCreAtivE 1 Task 2 and links 457 gene names with 1352 functions. GoWeb finds 58% of the functional GeneOntology annotations. The second benchmark is based on 26 case reports and links symptoms with diseases. GoWeb achieves 77% success rate improving an existing approach by nearly 20%. The third benchmark is based on 28 questions in the TREC genomics challenge and links proteins to diseases. GoWeb achieves a success rate of 79%.
GoWeb's combination of classical Web search with text-mining and ontologies is a first step towards answering questions in the biomedical domain. GoWeb is online at: http://www.gopubmed.org/goweb.
当前的搜索引擎是基于关键词的。语义技术承诺将出现下一代的语义搜索引擎,能够回答问题。当前的方法要么是对非结构化文本应用自然语言处理,要么是假设存在它们可以推理的结构化语句。
在这里,我们引入了第三种方法 GoWeb,它将基于关键词的经典网络搜索与文本挖掘和本体论相结合,以导航大型结果集并促进问题回答。我们在三个基因和功能、症状和疾病以及蛋白质和疾病的问题基准上评估了 GoWeb。第一个基准是基于 BioCreAtivE 1 任务 2 的,将 457 个基因名称与 1352 个功能联系起来。GoWeb 找到了 58%的功能基因本体论注释。第二个基准是基于 26 个病例报告的,将症状与疾病联系起来。GoWeb 的成功率为 77%,比现有方法提高了近 20%。第三个基准是基于 TREC 基因组学挑战赛的 28 个问题的,将蛋白质与疾病联系起来。GoWeb 的成功率为 79%。
GoWeb 将经典网络搜索与文本挖掘和本体论相结合,是在生物医学领域回答问题的第一步。GoWeb 可在 http://www.gopubmed.org/goweb 上访问。