Sander André, Wauer Roland
ID GmbH & Co. KGaA, Platz vor dem Neuen Tor 2, 10115, Berlin, Germany.
Klinik für Neonatologie, Charité-Universitätsmedizin Berlin, 10098, Berlin, Germany.
J Biomed Semantics. 2019 Apr 24;10(1):7. doi: 10.1186/s13326-019-0199-z.
Most electronic medical records still contain large amounts of free-text data. Semantic evaluation of such data requires the data to be encoded with sufficient classifications or transformed into a knowledge-based database.
We present an approach that allows databases accessible via SQL (Structured Query Language) to be searched directly through semantic queries without the need for further transformations. Therefore, we developed I) an extension to SQL named Ontology-SQL (O-SQL) that allows to use semantic expressions, II) a framework that uses a standard terminology server to annotate free-text containing database tables and III) a parser that rewrites O-SQL to SQL, so that such queries can be passed to the database server.
I) We compared several semantic queries published to date and were able to reproduce them in a reduced, highly condensed form. II) The quality of the annotation process was measured against manual annotation, and we found a sensitivity of 97.62% and a specificity of 100.00%. III) Different semantic queries were analyzed, and measured with F-scores between 0.91 and 0.98.
We showed that systematic analysis of free-text-containing medical records is possible with standard tools. The seamless connection of ontologies and standard technologies from the database field represents an important constituent of unstructured data analysis. The developed technology can be readily applied to relationally organized data and supports the increasingly important field of translational research.
大多数电子病历仍包含大量自由文本数据。对此类数据进行语义评估需要用足够的分类对数据进行编码或将其转换为基于知识的数据库。
我们提出了一种方法,可直接通过语义查询搜索可通过SQL(结构化查询语言)访问的数据库,而无需进一步转换。因此,我们开发了:I)SQL的一个扩展,名为本体SQL(O-SQL),它允许使用语义表达式;II)一个框架,该框架使用标准术语服务器对包含数据库表的自由文本进行注释;III)一个解析器,将O-SQL重写为SQL,以便此类查询可以传递到数据库服务器。
I)我们比较了迄今发布的几个语义查询,并能够以简化的高度浓缩形式重现它们。II)根据人工注释测量注释过程的质量,我们发现灵敏度为97.62%,特异性为100.00%。III)分析了不同的语义查询,F分数在0.91至0.98之间。
我们表明,使用标准工具对包含自由文本的病历进行系统分析是可行的。本体与数据库领域标准技术的无缝连接是非结构化数据分析的重要组成部分。所开发的技术可以很容易地应用于关系型组织的数据,并支持日益重要的转化研究领域。