Suppr超能文献

时态医学数据中的抗噪声相似性搜索

Noise-tolerant similarity search in temporal medical data.

作者信息

Bonomi Luca, Fan Liyue, Jiang Xiaoqian

机构信息

UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, United States of America.

Department of Computer Science, University of North Carolina at Charlotte, Charlotte, United States of America.

出版信息

J Biomed Inform. 2021 Jan;113:103667. doi: 10.1016/j.jbi.2020.103667. Epub 2020 Dec 25.

Abstract

Temporal medical data are increasingly integrated into the development of data-driven methods to deliver better healthcare. Searching such data for patterns can improve the detection of disease cases and facilitate the design of preemptive interventions. For example, specific temporal patterns could be used to recognize low-prevalence diseases, which are often under-diagnosed. However, searching these patterns in temporal medical data is challenging, as the data are often noisy, complex, and large in scale. In this work, we propose an effective and efficient solution to search for patients who exhibit conditions that resemble the input query. In our solution, we propose a similarity notion based on the Longest Common Subsequence (LCSS), which is used to measure the similarity between the query and the patient's temporal medical data and to ensure robustness against noise in the data. Our solution adopts locality sensitive hashing techniques to address the high dimensionality of medical data, by embedding the recorded clinical events (e.g., medications and diagnosis codes) into compact signatures. To perform pattern search in large EHR datasets, we propose a filtering approach based on tandem patterns, which effectively identifies candidate matches while discarding irrelevant data. The evaluations conducted using a real-world dataset demonstrate that our solution is highly accurate while significantly accelerating the similarity search.

摘要

时态医学数据越来越多地被整合到数据驱动方法的开发中,以提供更好的医疗保健服务。在这些数据中搜索模式可以改善疾病病例的检测,并有助于设计预防性干预措施。例如,特定的时态模式可用于识别往往诊断不足的低流行疾病。然而,在时态医学数据中搜索这些模式具有挑战性,因为数据通常有噪声、复杂且规模庞大。在这项工作中,我们提出了一种有效且高效的解决方案,用于搜索表现出与输入查询相似病症的患者。在我们的解决方案中,我们基于最长公共子序列(LCSS)提出了一种相似性概念,该概念用于测量查询与患者时态医学数据之间的相似性,并确保对数据中的噪声具有鲁棒性。我们的解决方案采用局部敏感哈希技术来解决医学数据的高维度问题,通过将记录的临床事件(如药物和诊断代码)嵌入到紧凑签名中。为了在大型电子健康记录(EHR)数据集中进行模式搜索,我们提出了一种基于串联模式的过滤方法,该方法在丢弃无关数据的同时有效地识别候选匹配项。使用真实世界数据集进行的评估表明,我们的解决方案非常准确,同时显著加速了相似性搜索。

相似文献

1
Noise-tolerant similarity search in temporal medical data.时态医学数据中的抗噪声相似性搜索
J Biomed Inform. 2021 Jan;113:103667. doi: 10.1016/j.jbi.2020.103667. Epub 2020 Dec 25.
4
Nonlinear Asymmetric Multi-Valued Hashing.非线性非对称多值哈希
IEEE Trans Pattern Anal Mach Intell. 2019 Nov;41(11):2660-2676. doi: 10.1109/TPAMI.2018.2867866. Epub 2018 Aug 30.
5
Semi-supervised hashing for large-scale search.半监督哈希算法在大规模搜索中的应用
IEEE Trans Pattern Anal Mach Intell. 2012 Dec;34(12):2393-406. doi: 10.1109/TPAMI.2012.48.
7
In Defense of Locality-Sensitive Hashing.《对局部敏感哈希的辩护》
IEEE Trans Neural Netw Learn Syst. 2018 Jan;29(1):87-103. doi: 10.1109/TNNLS.2016.2615085. Epub 2016 Oct 24.
8
Asymmetric distances for binary embeddings.二进制嵌入的非对称距离。
IEEE Trans Pattern Anal Mach Intell. 2014 Jan;36(1):33-47. doi: 10.1109/TPAMI.2013.101.
9
Scalable Supervised Asymmetric Hashing With Semantic and Latent Factor Embedding.具有语义和潜在因子嵌入的可扩展监督非对称哈希
IEEE Trans Image Process. 2019 Oct;28(10):4803-4818. doi: 10.1109/TIP.2019.2912290. Epub 2019 May 8.

本文引用的文献

2
Measure clinical drug-drug similarity using Electronic Medical Records.利用电子病历衡量临床药物-药物相似性。
Int J Med Inform. 2019 Apr;124:97-103. doi: 10.1016/j.ijmedinf.2019.02.003. Epub 2019 Feb 11.
5
Patient ranking with temporally annotated data.基于时间标注数据的患者排序。
J Biomed Inform. 2018 Feb;78:43-53. doi: 10.1016/j.jbi.2017.12.007. Epub 2017 Dec 19.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验