Department of Computer Science, Stony Brook University, Stony Brook, New York, United States of America.
Department of Family, Population and Preventive Medicine, Stony Brook University, Stony Brook, New York, United States of America.
PLoS One. 2018 Apr 26;13(4):e0194407. doi: 10.1371/journal.pone.0194407. eCollection 2018.
The goal of this study is to discover disease co-occurrence and sequence patterns from large scale cancer diagnosis histories in New York State. In particular, we want to identify disparities among different patient groups. Our study will provide essential knowledge for clinical researchers to further investigate comorbidities and disease progression for improving the management of multiple diseases. We used inpatient discharge and outpatient visit records from the New York State Statewide Planning and Research Cooperative System (SPARCS) from 2011-2015. We grouped each patient's visit history to generate diagnosis sequences for seven most popular cancer types. We performed frequent disease co-occurrence mining using the Apriori algorithm, and frequent disease sequence patterns discovery using the cSPADE algorithm. Different types of cancer demonstrated distinct patterns. Disparities of both disease co-occurrence and sequence patterns were observed from patients within different age groups. There were also considerable disparities in disease co-occurrence patterns with respect to different claim types (i.e., inpatient, outpatient, emergency department and ambulatory surgery). Disparities regarding genders were mostly found where the cancer types were gender specific. Supports of most patterns were usually higher for males than for females. Compared with secondary diagnosis codes, primary diagnosis codes can convey more stable results. Two disease sequences consisting of the same diagnoses but in different orders were usually with different supports. Our results suggest that the methods adopted can generate potentially interesting and clinically meaningful disease co-occurrence and sequence patterns, and identify disparities among various patient groups. These patterns could imply comorbidities and disease progressions.
本研究旨在从纽约州大规模癌症诊断史中发现疾病共现和序列模式。特别是,我们希望确定不同患者群体之间的差异。我们的研究将为临床研究人员提供必要的知识,以进一步研究共病和疾病进展,从而改善多种疾病的管理。我们使用了 2011-2015 年纽约州全州规划和研究合作系统 (SPARCS) 的住院患者出院和门诊就诊记录。我们将每位患者的就诊历史分组,为七种最常见的癌症类型生成诊断序列。我们使用 Apriori 算法进行频繁疾病共现挖掘,并使用 cSPADE 算法发现频繁疾病序列模式。不同类型的癌症表现出不同的模式。不同年龄组的患者中观察到疾病共现和序列模式的差异。不同索赔类型(即住院、门诊、急诊和门诊手术)的疾病共现模式也存在相当大的差异。与特定性别相关的癌症类型存在性别差异。大多数模式的支持率通常男性高于女性。与次要诊断代码相比,主要诊断代码可以传达更稳定的结果。由相同诊断但顺序不同的两个疾病序列通常具有不同的支持率。我们的结果表明,所采用的方法可以生成潜在有趣且具有临床意义的疾病共现和序列模式,并确定不同患者群体之间的差异。这些模式可能暗示共病和疾病进展。