Trofymenko Maksym, Korchmar Eduard, Kaduk Denys, Vikhrak Marta, Khilchevskyi Bohdan, Nesmiian Tetiana, Talapova Polina, Ved Max, Ageeva Inna
IT company SciForce, Kharkiv, Ukraine.
Taras Shevchenko National University of Kyiv, Kyiv, Ukraine.
Sci Rep. 2025 Jul 2;15(1):23674. doi: 10.1038/s41598-025-04046-9.
Accurate mapping of complex health data to the OMOP CDM while preserving clinical nuance remains a challenge. We introduce Jackalope Plus, a novel tool leveraging SNOMED CT post-coordination and a GPT-4o mini LLM, to significantly enhance the precision and efficiency of this process. Our two-step approach combines semantic search with LLM-driven standardization, enabling accurate conversion of intricate medical concepts. Evaluation on benchmark and custom datasets demonstrates that Jackalope Plus identifies correct mappings for over 77.5% of complex terminologies, substantially outperforming Usagi (52.5%) and matching the accuracy of manual mapping while offering up to 50% time savings. Jackalope Plus offers a versatile solution for diverse healthcare data environments. Future work will focus on refining the tool through user feedback integration and addressing ambiguities in overlapping concepts. A free beta version is available for research and feedback. Ethical review confirms no storage of patient-identifiable information.
在保留临床细微差别同时,将复杂的健康数据准确映射到OMOP通用数据模型仍然是一项挑战。我们推出了Jackalope Plus,这是一种利用SNOMED CT后置协调和GPT-4o小型语言模型的新型工具,可显著提高这一过程的精度和效率。我们的两步法将语义搜索与语言模型驱动的标准化相结合,能够准确转换复杂的医学概念。在基准数据集和自定义数据集上的评估表明,Jackalope Plus能为超过77.5%的复杂术语识别出正确的映射,大大优于Usagi(52.5%),并在节省高达50%时间的同时,达到了手动映射的准确性。Jackalope Plus为多样化的医疗数据环境提供了一个通用的解决方案。未来的工作将专注于通过整合用户反馈来完善该工具,并解决重叠概念中的模糊性问题。可提供免费的测试版以供研究和反馈。伦理审查确认不会存储可识别患者身份的信息。