Wang Fei
Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.
J Biomed Inform. 2015 Jun;55:41-54. doi: 10.1016/j.jbi.2015.01.009. Epub 2015 Feb 3.
With the rapid development of information technologies, tremendous amount of data became readily available in various application domains. This big data era presents challenges to many conventional data analytics research directions including data capture, storage, search, sharing, analysis, and visualization. It is no surprise to see that the success of next-generation healthcare systems heavily relies on the effective utilization of gigantic amounts of medical data. The ability of analyzing big data in modern healthcare systems plays a vital role in the improvement of the quality of care delivery. Specifically, patient similarity evaluation aims at estimating the clinical affinity and diagnostic proximity of patients. As one of the successful data driven techniques adopted in healthcare systems, patient similarity evaluation plays a fundamental role in many healthcare research areas such as prognosis, risk assessment, and comparative effectiveness analysis. However, existing algorithms for patient similarity evaluation are inefficient in handling massive patient data. In this paper, we propose an Adaptive Semi-Supervised Recursive Tree Partitioning (ART) framework for large scale patient indexing such that the patients with similar clinical or diagnostic patterns can be correctly and efficiently retrieved. The framework is designed for semi-supervised settings since it is crucial to leverage experts' supervision knowledge in medical scenario, which are fairly limited compared to the available data. Starting from the proposed ART framework, we will discuss several specific instantiations and validate them on both benchmark and real world healthcare data. Our results show that with the ART framework, the patients can be efficiently and effectively indexed in the sense that (1) similarity patients can be retrieved in a very short time; (2) the retrieval performance can beat the state-of-the art indexing methods.
随着信息技术的快速发展,大量数据在各个应用领域变得 readily available。这个大数据时代给许多传统数据分析研究方向带来了挑战,包括数据捕获、存储、搜索、共享、分析和可视化。毫不奇怪,下一代医疗系统的成功严重依赖于对海量医疗数据的有效利用。现代医疗系统中分析大数据的能力在提高护理质量方面起着至关重要的作用。具体来说,患者相似性评估旨在估计患者的临床亲和力和诊断接近度。作为医疗系统中采用的成功数据驱动技术之一,患者相似性评估在许多医疗研究领域,如预后、风险评估和比较有效性分析中发挥着基础性作用。然而,现有的患者相似性评估算法在处理海量患者数据时效率低下。在本文中,我们提出了一种用于大规模患者索引的自适应半监督递归树分区(ART)框架,以便能够正确有效地检索具有相似临床或诊断模式的患者。该框架是为半监督设置而设计的,因为在医疗场景中利用专家的监督知识至关重要,而与可用数据相比,专家知识相当有限。从提出的 ART 框架出发,我们将讨论几种具体实例,并在基准和真实世界医疗数据上对它们进行验证。我们的结果表明,使用 ART 框架,可以在以下意义上对患者进行高效有效的索引:(1)可以在很短的时间内检索到相似患者;(2)检索性能可以超过现有最先进的索引方法。