Department of Medicine, The University of Arizona, Tucson, AZ, USA.
BIO5 Institute, The University of Arizona, Tucson, AZ, USA.
J Healthc Eng. 2017;2017:3818302. doi: 10.1155/2017/3818302. Epub 2017 Aug 30.
Exposome is a critical dimension in the precision medicine paradigm. Effective representation of exposomics knowledge is instrumental to melding nongenetic factors into data analytics for clinical research. There is still limited work in (1) modeling exposome entities and relations with proper integration to mainstream ontologies and (2) systematically studying their presence in clinical context. Through selected ontological relations, we developed a template-driven approach to identifying exposome concepts from the Unified Medical Language System (UMLS). The derived concepts were evaluated in terms of literature coverage and the ability to assist in annotating clinical text. The generated semantic model represents rich domain knowledge about exposure events (454 pairs of relations between exposure and outcome). Additionally, a list of 5667 disorder concepts with microbial etiology was created for inferred pathogen exposures. The model consistently covered about 90% of PubMed literature on exposure-induced iatrogenic diseases over 10 years (2001-2010). The model contributed to the efficiency of exposome annotation in clinical text by filtering out 78% of irrelevant machine annotations. Analysis into 50 annotated discharge summaries helped advance our understanding of the exposome information in clinical text. This pilot study demonstrated feasibility of semiautomatically developing a useful semantic resource for exposomics.
暴露组学是精准医学范式的一个重要维度。有效表示暴露组学知识对于将非遗传因素融入临床研究的数据分析至关重要。在(1)用适当的方法将暴露组实体和关系建模并集成到主流本体中,以及(2)系统地研究它们在临床环境中的存在方面,仍然有有限的工作。通过选择的本体关系,我们开发了一种从统一医学语言系统(UMLS)中识别暴露组概念的模板驱动方法。根据文献覆盖范围和协助注释临床文本的能力对派生的概念进行了评估。生成的语义模型代表了关于暴露事件的丰富领域知识(暴露与结果之间有 454 对关系)。此外,还创建了一个包含 5667 个具有微生物病因的疾病概念列表,用于推断病原体暴露。该模型在 10 年(2001-2010 年)内覆盖了约 90%的关于暴露引起的医源性疾病的 PubMed 文献。该模型通过过滤掉 78%的不相关机器注释,有助于提高临床文本中暴露组注释的效率。对 50 份已注释的出院小结进行的分析有助于深入了解临床文本中的暴露组信息。这项初步研究证明了半自动开发用于暴露组学的有用语义资源的可行性。