Department of Information Science, College of Computing & Informatics, Drexel University, Philadelphia, Pennsylvania, USA.
Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA.
J Am Med Inform Assoc. 2020 Apr 1;27(4):558-566. doi: 10.1093/jamia/ocaa005.
This study introduces a temporal condition pattern mining methodology to address the sparse nature of coded condition concept utilization in electronic health record data. As a validation study, we applied this method to reveal condition patterns surrounding an initial diagnosis of pediatric asthma.
The SPADE (Sequential PAttern Discovery using Equivalence classes) algorithm was used to identify common temporal condition patterns surrounding the initial diagnosis of pediatric asthma in a study population of 71 824 patients from the Children's Hospital of Philadelphia. SPADE was applied to a dataset with diagnoses coded using International Classification of Diseases (ICD) concepts and separately to a dataset with the ICD codes mapped to their corresponding expanded diagnostic clusters (EDCs). Common temporal condition patterns surrounding the initial diagnosis of pediatric asthma ascertained by SPADE from both the ICD and EDC datasets were compared.
SPADE identified 36 unique diagnoses in the mapped EDC dataset, whereas only 19 were recognized in the ICD dataset. Temporal trends in condition diagnoses ascertained from the EDC data were not discoverable in the ICD dataset.
Mining frequent temporal condition patterns from large electronic health record datasets may reveal previously unknown associations between diagnoses that could inform future research into causation or other relationships. Mapping sparsely coded medical concepts into homogenous groups was essential to discovering potentially useful information from our dataset.
We expect that the presented methodology is applicable to the study of diagnostic trajectories for other clinical conditions and can be extended to study temporal patterns of other coded medical concepts such as medications and procedures.
本研究引入了一种时间条件模式挖掘方法,以解决电子健康记录数据中编码条件概念利用的稀疏性问题。作为验证研究,我们应用该方法揭示了围绕儿童哮喘初始诊断的条件模式。
使用 SPADE(使用等价类的序列模式发现)算法在费城儿童医院的 71824 名患者的研究人群中,确定了儿童哮喘初始诊断周围常见的时间条件模式。SPADE 应用于使用国际疾病分类(ICD)概念编码的数据集,以及将 ICD 代码映射到其相应的扩展诊断集群(EDC)的数据集。比较 SPADE 从 ICD 和 EDC 数据集确定的围绕儿童哮喘初始诊断的常见时间条件模式。
SPADE 在映射的 EDC 数据集中识别出 36 个独特的诊断,而在 ICD 数据集中仅识别出 19 个。从 EDC 数据中确定的条件诊断的时间趋势在 ICD 数据集中不可发现。
从大型电子健康记录数据集挖掘常见的时间条件模式可能会揭示以前未知的诊断之间的关联,这可以为未来的因果关系或其他关系研究提供信息。将稀疏编码的医学概念映射到同质组是从我们的数据集发现潜在有用信息的关键。
我们预计所提出的方法适用于其他临床条件的诊断轨迹研究,并可扩展到研究其他编码医学概念(如药物和程序)的时间模式。