Bonomi Luca, Fan Liyue, Jiang Xiaoqian
UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, United States of America.
Department of Computer Science, University of North Carolina at Charlotte, Charlotte, United States of America.
J Biomed Inform. 2021 Jan;113:103667. doi: 10.1016/j.jbi.2020.103667. Epub 2020 Dec 25.
Temporal medical data are increasingly integrated into the development of data-driven methods to deliver better healthcare. Searching such data for patterns can improve the detection of disease cases and facilitate the design of preemptive interventions. For example, specific temporal patterns could be used to recognize low-prevalence diseases, which are often under-diagnosed. However, searching these patterns in temporal medical data is challenging, as the data are often noisy, complex, and large in scale. In this work, we propose an effective and efficient solution to search for patients who exhibit conditions that resemble the input query. In our solution, we propose a similarity notion based on the Longest Common Subsequence (LCSS), which is used to measure the similarity between the query and the patient's temporal medical data and to ensure robustness against noise in the data. Our solution adopts locality sensitive hashing techniques to address the high dimensionality of medical data, by embedding the recorded clinical events (e.g., medications and diagnosis codes) into compact signatures. To perform pattern search in large EHR datasets, we propose a filtering approach based on tandem patterns, which effectively identifies candidate matches while discarding irrelevant data. The evaluations conducted using a real-world dataset demonstrate that our solution is highly accurate while significantly accelerating the similarity search.
时态医学数据越来越多地被整合到数据驱动方法的开发中,以提供更好的医疗保健服务。在这些数据中搜索模式可以改善疾病病例的检测,并有助于设计预防性干预措施。例如,特定的时态模式可用于识别往往诊断不足的低流行疾病。然而,在时态医学数据中搜索这些模式具有挑战性,因为数据通常有噪声、复杂且规模庞大。在这项工作中,我们提出了一种有效且高效的解决方案,用于搜索表现出与输入查询相似病症的患者。在我们的解决方案中,我们基于最长公共子序列(LCSS)提出了一种相似性概念,该概念用于测量查询与患者时态医学数据之间的相似性,并确保对数据中的噪声具有鲁棒性。我们的解决方案采用局部敏感哈希技术来解决医学数据的高维度问题,通过将记录的临床事件(如药物和诊断代码)嵌入到紧凑签名中。为了在大型电子健康记录(EHR)数据集中进行模式搜索,我们提出了一种基于串联模式的过滤方法,该方法在丢弃无关数据的同时有效地识别候选匹配项。使用真实世界数据集进行的评估表明,我们的解决方案非常准确,同时显著加速了相似性搜索。