Sfakianaki Pepi, Koumakis Lefteris, Sfakianakis Stelios, Iatraki Galatia, Zacharioudakis Giorgos, Graf Norbert, Marias Kostas, Tsiknakis Manolis
Foundation for Research and Technology Hellas (FORTH), Institute of Computer Science, N. Plastira 100, Vassilika Vouton, Heraklion, Crete, Greece.
Paediatric Haematology and Oncology, Saarland University Hospital, Homburg, Germany.
BMC Med Inform Decis Mak. 2015 Sep 30;15:77. doi: 10.1186/s12911-015-0200-4.
A plethora of publicly available biomedical resources do currently exist and are constantly increasing at a fast rate. In parallel, specialized repositories are been developed, indexing numerous clinical and biomedical tools. The main drawback of such repositories is the difficulty in locating appropriate resources for a clinical or biomedical decision task, especially for non-Information Technology expert users. In parallel, although NLP research in the clinical domain has been active since the 1960s, progress in the development of NLP applications has been slow and lags behind progress in the general NLP domain. The aim of the present study is to investigate the use of semantics for biomedical resources annotation with domain specific ontologies and exploit Natural Language Processing methods in empowering the non-Information Technology expert users to efficiently search for biomedical resources using natural language.
A Natural Language Processing engine which can "translate" free text into targeted queries, automatically transforming a clinical research question into a request description that contains only terms of ontologies, has been implemented. The implementation is based on information extraction techniques for text in natural language, guided by integrated ontologies. Furthermore, knowledge from robust text mining methods has been incorporated to map descriptions into suitable domain ontologies in order to ensure that the biomedical resources descriptions are domain oriented and enhance the accuracy of services discovery. The framework is freely available as a web application at ( http://calchas.ics.forth.gr/ ).
For our experiments, a range of clinical questions were established based on descriptions of clinical trials from the ClinicalTrials.gov registry as well as recommendations from clinicians. Domain experts manually identified the available tools in a tools repository which are suitable for addressing the clinical questions at hand, either individually or as a set of tools forming a computational pipeline. The results were compared with those obtained from an automated discovery of candidate biomedical tools. For the evaluation of the results, precision and recall measurements were used. Our results indicate that the proposed framework has a high precision and low recall, implying that the system returns essentially more relevant results than irrelevant.
There are adequate biomedical ontologies already available, sufficiency of existing NLP tools and quality of biomedical annotation systems for the implementation of a biomedical resources discovery framework, based on the semantic annotation of resources and the use on NLP techniques. The results of the present study demonstrate the clinical utility of the application of the proposed framework which aims to bridge the gap between clinical question in natural language and efficient dynamic biomedical resources discovery.
目前存在大量可公开获取的生物医学资源,且数量正在快速持续增长。与此同时,专门的知识库也在不断发展,对众多临床和生物医学工具进行索引。此类知识库的主要缺点在于,难以针对临床或生物医学决策任务找到合适的资源,尤其是对于非信息技术专家用户而言。此外,尽管自20世纪60年代以来临床领域的自然语言处理(NLP)研究一直很活跃,但NLP应用程序的开发进展缓慢,落后于一般NLP领域的进展。本研究的目的是调查如何使用语义学通过特定领域的本体对生物医学资源进行注释,并利用自然语言处理方法使非信息技术专家用户能够使用自然语言高效地搜索生物医学资源。
已实现了一个自然语言处理引擎,它可以将自由文本“翻译”为目标查询,自动将临床研究问题转化为仅包含本体术语的请求描述。该实现基于自然语言文本的信息提取技术,并由集成本体进行引导。此外,还纳入了强大的文本挖掘方法中的知识,以便将描述映射到合适的领域本体中,从而确保生物医学资源描述以领域为导向,并提高服务发现的准确性。该框架可作为一个Web应用程序免费获取(http://calchas.ics.forth.gr/)。
在我们的实验中,基于ClinicalTrials.gov注册库中临床试验的描述以及临床医生的建议,确定了一系列临床问题。领域专家手动在工具知识库中识别出适合单独或作为构成计算管道的一组工具来解决手头临床问题的可用工具。将结果与通过自动发现候选生物医学工具获得的结果进行比较。为了评估结果,使用了精确率和召回率测量方法。我们的结果表明,所提出的框架具有高精度和低召回率,这意味着该系统返回的相关结果比不相关结果多得多。
已有足够的生物医学本体、现有的NLP工具以及生物医学注释系统的质量,足以基于资源的语义注释和NLP技术的使用来实现生物医学资源发现框架。本研究结果证明了所提出框架应用的临床实用性,该框架旨在弥合自然语言中的临床问题与高效动态生物医学资源发现之间的差距。