Software and Information Systems Engineering, Ben-Gurion University, Beer Sheva, Israel.
J Biomed Inform. 2017 Nov;75:83-95. doi: 10.1016/j.jbi.2017.10.002. Epub 2017 Oct 4.
Increasingly, frequent temporal patterns discovered in longitudinal patient records are proposed as features for classification and prediction, and as means to cluster patient clinical trajectories. However, to justify that, we must demonstrate that most frequent temporal patterns are indeed consistently discoverable within the records of different patient subsets within similar patient populations. We have developed several measures for the consistency of the discovery of temporal patterns. We focus on time-interval relations patterns (TIRPs) that can be discovered within different subsets of the same patient population. We expect the discovered TIRPs (1) to be frequent in each subset, (2) preserve their "local" metrics - the absolute frequency of each pattern, measured by a Proportion Test, and (3) preserve their "global" characteristics - their overall distribution, measured by a Kolmogorov-Smirnov test. We also wanted to examine the effect on consistency, over a variety of settings, of varying the minimal frequency threshold for TIRP discovery, and of using a TIRP-filtering criterion that we previously introduced, the Semantic Adjacency Criterion (SAC). We applied our methodology to three medical domains (oncology, infectious hepatitis, and diabetes). We found that, within the minimal frequency ranges we had examined, 70-95% of the discovered TIRPs were consistently discoverable; 40-48% of them maintained their local frequency. TIRP global distribution similarity varied widely, from 0% to 65%. Increasing the threshold usually increased the percentage of TIRPs that were repeatedly discovered across different patient subsets within the same domain, and the probability of a similar TIRP distribution. Using the SAC principle, enhanced, for most minimal support levels, the percentage of repeating TIRPs, their local consistency and their global consistency. The effect of using the SAC was further strengthened as the minimal frequency threshold was raised.
越来越多的研究提出,在纵向患者记录中发现的频繁时间模式可以作为分类和预测的特征,也可以作为聚类患者临床轨迹的方法。然而,为了证明这一点,我们必须证明,在相似患者群体的不同患者子集中的记录中,大多数频繁的时间模式确实可以一致地发现。我们已经开发了几种用于一致性发现时间模式的度量标准。我们专注于可以在同一患者群体的不同子集中发现的时间间隔关系模式(TIRP)。我们期望发现的 TIRP(1)在每个子集中都很频繁,(2)保留其“局部”度量-通过比例检验测量的每种模式的绝对频率,以及(3)保留其“全局”特征-通过柯尔莫哥洛夫-斯米尔诺夫检验测量的整体分布。我们还想研究在不同设置下,变化 TIRP 发现的最小频率阈值以及使用我们之前引入的 TIRP 过滤标准-语义邻近性标准(SAC)对一致性的影响。我们将我们的方法应用于三个医学领域(肿瘤学、传染性肝炎和糖尿病)。我们发现,在所检查的最小频率范围内,70-95%的发现 TIRP 可以一致地发现;其中 40-48%保持其局部频率。TIRP 全局分布相似度差异很大,从 0%到 65%不等。增加阈值通常会增加在同一域的不同患者子集中重复发现的 TIRP 百分比,以及相似 TIRP 分布的概率。使用 SAC 原则,在大多数最小支持水平下,重复 TIRP 的百分比、其局部一致性和全局一致性都得到了提高。随着最小频率阈值的提高,使用 SAC 的效果进一步增强。