School of Medicine, University of California San Francisco, San Francisco, California, USA
Department of Neurological Surgery, University of California San Francisco, San Francisco, California, USA.
BMJ Health Care Inform. 2022 Apr;29(1). doi: 10.1136/bmjhci-2021-100532.
The transition from ICD-9 to ICD-10 coding creates a data standardisation challenge for large-scale longitudinal research. We sought to develop a programme that automated this standardisation process.
A programme was developed to standardise ICD-9 and ICD-10 terminology into one system. Code was improved to reduce runtime, and two iterations were tested on a joint ICD-9/ICD-10 database of 15.8 million patients.
Both programmes successfully standardised diagnostic terminology in the database. While the original programme updated 100 000 cells in 12.5 hours, the improved programme translated 3.1 million cells in 38 min.
While both programmes successfully translated ICD-related data into a standardised format, the original programme suffered from excessive runtimes. Code improvement with hash tables and parallelisation exponentially reduced these runtimes.
Databases with ICD-9 and ICD-10 codes require terminology standardisation for analysis. By sharing our programme's implementation, we hope to assist other researchers in standardising their own databases.
从 ICD-9 到 ICD-10 编码的转变给大规模纵向研究的数据标准化带来了挑战。我们试图开发一个程序来自动完成这个标准化过程。
开发了一个程序将 ICD-9 和 ICD-10 术语标准化为一个系统。改进了代码以减少运行时间,并在一个包含 1580 万患者的 ICD-9/ICD-10 联合数据库上对两个迭代进行了测试。
两个程序都成功地对数据库中的诊断术语进行了标准化。虽然原始程序在 12.5 小时内更新了 100000 个单元格,但改进后的程序在 38 分钟内翻译了 310 万个单元格。
虽然两个程序都成功地将 ICD 相关数据转换为标准化格式,但原始程序的运行时间过长。通过使用哈希表和并行化来改进代码,可以使这些运行时间呈指数级减少。
包含 ICD-9 和 ICD-10 代码的数据库需要进行术语标准化才能进行分析。通过分享我们程序的实现,我们希望帮助其他研究人员标准化他们自己的数据库。