Manukyan Narine, Eppstein Margaret J, Horbar Jeffrey D, Leahy Kathleen A, Kenny Michael J, Mukherjee Shreya, Rizzo Donna M
Int J Adv Comput Sci. 2013 Jul;3(7):322-329.
We introduce a new method for exploratory analysis of large data sets with time-varying features, where the aim is to automatically discover novel relationships between features (over some time period) that are predictive of any of a number of time-varying outcomes (over some other time period). Using a genetic algorithm, we co-evolve (i) a subset of predictive features, (ii) which attribute will be predicted (iii) the time period over which to assess the predictive features, and (iv) the time period over which to assess the predicted attribute. After validating the method on 15 synthetic test problems, we used the approach for exploratory analysis of a large healthcare network data set. We discovered a strong association, with 100% sensitivity, between hospital participation in multi-institutional quality improvement collaboratives during or before 2002, and changes in the risk-adjusted rates of mortality and morbidity observed after a 1-2 year lag. The proposed approach is a potentially powerful and general tool for exploratory analysis of a wide range of time-series data sets.
我们介绍了一种用于对具有时变特征的大数据集进行探索性分析的新方法,其目的是自动发现(在某个时间段内)特征之间的新关系,这些关系能够预测多个时变结果中的任何一个(在其他某个时间段内)。使用遗传算法,我们共同进化(i)预测特征的一个子集,(ii)将要预测的属性,(iii)评估预测特征的时间段,以及(iv)评估预测属性的时间段。在15个合成测试问题上验证该方法后,我们将该方法用于对一个大型医疗网络数据集的探索性分析。我们发现,在2002年期间或之前医院参与多机构质量改进协作与1 - 2年滞后后观察到的风险调整死亡率和发病率变化之间存在100%敏感性的强关联。所提出的方法是一种用于广泛时间序列数据集探索性分析的潜在强大且通用的工具。