Cameron Delroy, Sheth Amit P, Jaykumar Nishita, Thirunarayan Krishnaprasad, Anand Gaurish, Smith Gary A
Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State University, Dayton OH 45435, USA.
Web Semant. 2014 Dec;29:39-52. doi: 10.1016/j.websem.2014.11.002.
While contemporary semantic search systems offer to improve classical keyword-based search, they are not always adequate for complex domain specific information needs. The domain of prescription drug abuse, for example, requires knowledge of both ontological concepts and "intelligible constructs" not typically modeled in ontologies. These intelligible constructs convey essential information that include notions of intensity, frequency, interval, dosage and sentiments, which could be important to the holistic needs of the information seeker. In this paper, we present a hybrid approach to domain specific information retrieval that integrates ontology-driven query interpretation with synonym-based query expansion and domain specific rules, to facilitate search in social media on prescription drug abuse. Our framework is based on a context-free grammar (CFG) that defines the query language of constructs interpretable by the search system. The grammar provides two levels of semantic interpretation: 1) a top-level CFG that facilitates retrieval of diverse textual patterns, which belong to broad templates and 2) a low-level CFG that enables interpretation of specific expressions belonging to such textual patterns. These low-level expressions occur as concepts from four different categories of data: 1) ontological concepts, 2) concepts in lexicons (such as emotions and sentiments), 3) concepts in lexicons with only partial ontology representation, called concepts (such as side effects and routes of administration (ROA)), and 4) domain specific expressions (such as date, time, interval, frequency and dosage) derived solely through rules. Our approach is embodied in a novel Semantic Web platform called PREDOSE, which provides search support for complex domain specific information needs in prescription drug abuse epidemiology. When applied to a corpus of over 1 million drug abuse-related web forum posts, our search framework proved effective in retrieving relevant documents when compared with three existing search systems.
虽然当代语义搜索系统旨在改进基于经典关键词的搜索,但它们并不总是足以满足复杂的特定领域信息需求。例如,处方药滥用领域既需要本体概念知识,也需要通常未在本体中建模的“可理解结构”知识。这些可理解结构传达了包括强度、频率、间隔、剂量和情感等概念在内的基本信息,这些信息对于信息寻求者的整体需求可能很重要。在本文中,我们提出了一种用于特定领域信息检索的混合方法,该方法将本体驱动的查询解释与基于同义词的查询扩展和特定领域规则相结合,以促进在社交媒体上搜索处方药滥用信息。我们的框架基于一种上下文无关语法(CFG),该语法定义了搜索系统可解释的结构查询语言。该语法提供了两个层次的语义解释:1)一个顶级CFG,便于检索属于广泛模板的各种文本模式;2)一个低级CFG,能够解释属于此类文本模式的特定表达式。这些低级表达式作为来自四类不同数据的概念出现:1)本体概念;2)词汇表中的概念(如情感和情绪);3)仅具有部分本体表示的词汇表中的概念,称为“部分概念”(如副作用和给药途径(ROA));4)仅通过规则派生的特定领域表达式(如日期、时间、间隔、频率和剂量)。我们的方法体现在一个名为PREDOSE的新型语义Web平台中,该平台为处方药滥用流行病学中复杂的特定领域信息需求提供搜索支持。当应用于超过100万个与药物滥用相关的网络论坛帖子的语料库时,与三个现有搜索系统相比,我们的搜索框架在检索相关文档方面被证明是有效的。