Greater Manchester Patient Safety Translational Research Centre, University of Manchester, Manchester, United Kingdom.
Division of Informatics, Imaging and Data Science, The University of Manchester, Manchester, United Kingdom.
PLoS One. 2019 Feb 14;14(2):e0212291. doi: 10.1371/journal.pone.0212291. eCollection 2019.
Clinical code sets are vital to research using routinely-collected electronic healthcare data. Existing code set engineering methods pose significant limitations when considering reproducible research. To improve the transparency and reusability of research, these code sets must abide by FAIR principles; this is not currently happening. We propose 'term sets', an equivalent alternative to code sets that are findable, accessible, interoperable and reusable.
We describe a new code set representation, consisting of natural language inclusion and exclusion terms (term sets), and explain its relationship to code sets. We formally prove that any code set has a corresponding term set. We demonstrate utility by searching for recently published code sets, representing them as term sets, and reporting on the number of inclusion and exclusion terms compared with the size of the code set.
Thirty-one code sets from 20 papers covering diverse disease domains were converted into term sets. The term sets were on average 74% the size of their equivalent original code set. Four term sets were larger due to deficiencies in the original code sets.
Term sets can concisely represent any code set. This may reduce barriers for examining and reusing code sets, which may accelerate research using healthcare databases. We have developed open-source software that supports researchers using term sets.
Term sets are independent of clinical code terminologies and therefore: enable reproducible research; are resistant to terminology changes; and are less error-prone as they are shorter than the equivalent code set.
临床代码集对于使用常规收集的电子医疗保健数据进行研究至关重要。现有的代码集工程方法在考虑可重复研究时存在重大局限性。为了提高研究的透明度和可重复性,这些代码集必须遵守 FAIR 原则;但目前这并没有发生。我们提出了“术语集”,这是一种与代码集等效的替代方案,可实现查找、可访问、互操作和可重复使用。
我们描述了一种新的代码集表示形式,由自然语言包含和排除术语(术语集)组成,并解释了它与代码集的关系。我们正式证明了任何代码集都有相应的术语集。我们通过搜索最近发表的代码集、将它们表示为术语集,并报告与代码集大小相比的包含和排除术语数量,来展示其实用性。
从涵盖不同疾病领域的 20 篇论文中转换了 31 个代码集为术语集。术语集的平均大小为其等效原始代码集的 74%。由于原始代码集的缺陷,有四个术语集更大。
术语集可以简洁地表示任何代码集。这可能会减少检查和重复使用代码集的障碍,从而加速使用医疗保健数据库的研究。我们已经开发了支持研究人员使用术语集的开源软件。
术语集独立于临床代码术语,因此:能够实现可重复研究;能够抵抗术语变化;并且由于它们比等效的代码集更短,因此出错的可能性更小。