Department of Computer Science and Engineering, Kyung Hee University, Seocheon-dong, Giheung-gu, South Korea; Department of Computer Science, National University of Computer and Emerging Science, Islamabad, Pakistan.
Department of Software, Sejong University, South Korea.
Int J Med Inform. 2019 Sep;129:133-145. doi: 10.1016/j.ijmedinf.2019.05.024. Epub 2019 Jun 7.
Standardized healthcare documents have a high adoption rate in today's hospital setup. This brings several challenges as processing the documents on a large scale takes a toll on the infrastructure. The complexity of these documents compounds the issue of handling them which is why applying big data techniques is necessary. The nature of big data techniques can trigger accuracy/semantic loss in health documents when they are partitioned for processing. This semantic loss is critical with respect to clinical use as well as insurance, or medical education.
In this paper we propose a novel technique to avoid any semantic loss that happens during the conventional partitioning of healthcare documents in big data through a constraint model based on the conformance of clinical document standard and user based use cases. We used clinical document architecture (CDAR) datasets on Hadoop Distributed File System (HDFS) through uniquely configured setup. We identified the affected documents with respect to semantic loss after partitioning and separated them into two sets: conflict free documents and conflicted documents. The resolution for conflicted documents was done based on different resolution strategies that were mapped according to CDAR specification. The first part of the technique is focused in identifying the type of conflict in the blocks that arises after partitioning. The second part focuses on the resolution mapping of the conflicts based on the constraints applied depending on the validation and user scenario.
We used a publicly available dataset of CDAR documents, identified all conflicted documents and resolved all the them successfully to avoid any semantic loss. In our experiment we tested up to 87,000 CDAR documents and successfully identified the conflicts and resolved the semantic issues.
We have presented a novel study that focuses on the semantics of big data which did not compromise the performance and resolved the semantic issues risen during the processing of clinical documents.
在当今的医院设置中,标准化的医疗保健文档的采用率很高。这带来了一些挑战,因为大规模处理这些文档会对基础设施造成影响。这些文档的复杂性加剧了处理它们的问题,这就是为什么需要应用大数据技术的原因。大数据技术的性质在将这些文档进行分区处理时可能会导致准确性/语义丢失。这种语义丢失对于临床使用以及保险或医学教育来说都是至关重要的。
在本文中,我们提出了一种新的技术,通过基于临床文档标准和基于用户用例的一致性的约束模型,避免在大数据中对医疗保健文档进行常规分区时发生任何语义丢失。我们在 Hadoop 分布式文件系统(HDFS)上使用了临床文档架构(CDAR)数据集,并通过独特的配置设置进行了使用。我们在分区后识别了具有语义丢失的受影响文档,并将其分为两组:无冲突文档和冲突文档。冲突文档的解决方案是根据根据 CDAR 规范映射的不同解决方案策略来完成的。该技术的第一部分侧重于识别分区后块中出现的冲突类型。第二部分侧重于根据所应用的约束,根据验证和用户场景映射冲突的解决。
我们使用了 CDAR 文档的公共可用数据集,识别了所有冲突文档,并成功解决了所有冲突,以避免任何语义丢失。在我们的实验中,我们测试了多达 87,000 个 CDAR 文档,并成功地识别了冲突并解决了语义问题。
我们提出了一项新的研究,重点关注大数据的语义,这不会影响性能,并解决了在处理临床文档时出现的语义问题。