聚类人口统计学和诊断代码序列。

Clustering Demographics and Sequences of Diagnosis Codes.

出版信息

IEEE J Biomed Health Inform. 2022 May;26(5):2351-2359. doi: 10.1109/JBHI.2021.3129461. Epub 2022 May 5.

DOI:10.1109/JBHI.2021.3129461

Abstract

A Relational-Sequential dataset (or RS-dataset for short) contains records comprised of a patient's values in demographic attributes and their sequence of diagnosis codes. The task of clustering an RS-dataset is helpful for analyses ranging from pattern mining to classification. However, existing methods are not appropriate to perform this task. Thus, we initiate a study of how an RS-dataset can be clustered effectively and efficiently. We formalize the task of clustering an RS-dataset as an optimization problem. At the heart of the problem is a distance measure we design to quantify the pairwise similarity between records of an RS-dataset. Our measure uses a tree structure that encodes hierarchical relationships between records, based on their demographics, as well as an edit-distance-like measure that captures both the sequentiality and the semantic similarity of diagnosis codes. We also develop an algorithm which first identifies k representative records (centers), for a given k, and then constructs k clusters, each containing one center and the records that are closer to the center compared to other centers. Experiments using two Electronic Health Record datasets demonstrate that our algorithm constructs compact and well-separated clusters, which preserve meaningful relationships between demographics and sequences of diagnosis codes, while being efficient and scalable.

摘要

关系-序列数据集（简称 RS 数据集）包含记录，这些记录由患者在人口统计学属性中的值及其诊断代码序列组成。对 RS 数据集进行聚类的任务有助于从模式挖掘到分类的各种分析。然而，现有的方法并不适合执行此任务。因此，我们开始研究如何有效地和有效地对 RS 数据集进行聚类。我们将 RS 数据集的聚类任务形式化为一个优化问题。该问题的核心是我们设计的一种距离度量标准，用于量化 RS 数据集记录之间的成对相似性。我们的度量标准使用基于记录的人口统计学信息的树结构来编码记录之间的层次关系，以及一种类似于编辑距离的度量标准，用于捕获诊断代码的顺序和语义相似性。我们还开发了一种算法，该算法首先为给定的 k 识别 k 个代表性记录（中心），然后构建 k 个聚类，每个聚类包含一个中心和与其他中心相比更接近中心的记录。使用两个电子健康记录数据集进行的实验表明，我们的算法构建了紧凑且分离良好的聚类，这些聚类保留了人口统计学信息和诊断代码序列之间有意义的关系，同时具有高效性和可扩展性。

相似文献

Clustering Demographics and Sequences of Diagnosis Codes.

IEEE J Biomed Health Inform. 2022 May;26(5):2351-2359. doi: 10.1109/JBHI.2021.3129461. Epub 2022 May 5.

Clustering datasets with demographics and diagnosis codes.

J Biomed Inform. 2020 Feb;102:103360. doi: 10.1016/j.jbi.2019.103360. Epub 2020 Jan 3.

Anonymizing datasets with demographics and diagnosis codes in the presence of utility constraints.

J Biomed Inform. 2017 Jan;65:76-96. doi: 10.1016/j.jbi.2016.11.001. Epub 2016 Nov 8.

Clustering clinical models from local electronic health records based on semantic similarity.

J Biomed Inform. 2015 Apr;54:294-304. doi: 10.1016/j.jbi.2014.12.015. Epub 2014 Dec 31.

GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness.

BMC Bioinformatics. 2019 Mar 27;20(1):155. doi: 10.1186/s12859-019-2752-2.

Efficient sequential and parallel algorithms for record linkage.

J Am Med Inform Assoc. 2014 Mar-Apr;21(2):252-62. doi: 10.1136/amiajnl-2013-002034. Epub 2013 Oct 23.

Optimizing research in symptomatic uterine fibroids with development of a computable phenotype for use with electronic health records.

Am J Obstet Gynecol. 2018 Jun;218(6):610.e1-610.e7. doi: 10.1016/j.ajog.2018.02.002. Epub 2018 Feb 9.

Identifying and characterizing highly similar notes in big clinical note datasets.

J Biomed Inform. 2018 Jun;82:63-69. doi: 10.1016/j.jbi.2018.04.009. Epub 2018 Apr 19.

Advanced methods for missing values imputation based on similarity learning.

PeerJ Comput Sci. 2021 Jul 21;7:e619. doi: 10.7717/peerj-cs.619. eCollection 2021.

An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records.

Artif Intell Med. 2015 Oct;65(2):155-66. doi: 10.1016/j.artmed.2015.04.007. Epub 2015 May 15.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

聚类人口统计学和诊断代码序列。

Clustering Demographics and Sequences of Diagnosis Codes.

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献