Department of Pediatrics, University of Michigan Medical School, Ann Arbor, MI 48109-5940, USA.
J Am Med Inform Assoc. 2013 Mar-Apr;20(2):332-41. doi: 10.1136/amiajnl-2012-001117. Epub 2012 Sep 27.
We describe an approach for modeling temporal relationships in a large scale association analysis of electronic health record data. The addition of temporal information can inform hypothesis generation and help to explain the relationships. We applied this approach on a dataset containing 41.2 million time-stamped International Classification of Diseases, Ninth Revision (ICD-9) codes from 1.6 million patients.
We performed two independent analyses including a pairwise association analysis using a χ(2) test and a temporal analysis using a binomial test. Data were visualized using network diagrams and reviewed for clinical significance.
We found nearly 400 000 highly associated pairs of ICD-9 codes with varying numbers of strong temporal associations ranging from ≥1 day to ≥10 years apart. Most of the findings were not considered clinically novel, although some, such as an association between Helicobacter pylori infection and diabetes, have recently been reported in the literature. The temporal analysis in our large cohort, however, revealed that diabetes usually preceded the diagnoses of H pylori, raising questions about possible cause and effect.
Such analyses have significant limitations, some of which are due to known problems with ICD-9 codes and others to potentially incomplete data even at a health system level. Nevertheless, large scale association analyses with temporal modeling can help provide a mechanism for novel discovery in support of hypothesis generation.
Temporal relationships can provide an additional layer of meaning in identifying and interpreting clinical associations.
我们描述了一种在电子健康记录数据的大规模关联分析中建模时间关系的方法。添加时间信息可以为假设生成提供信息,并有助于解释这些关系。我们将这种方法应用于一个包含 160 万患者的 4120 万条时间戳记的国际疾病分类第 9 版(ICD-9)代码的数据集。
我们进行了两项独立的分析,包括使用卡方检验进行的成对关联分析和使用二项式检验进行的时间分析。使用网络图可视化数据,并检查其临床意义。
我们发现了近 400000 对具有不同数量的强时间关联的高度关联的 ICD-9 代码,关联时间从≥1 天到≥10 年不等。大多数发现并不被认为具有临床新颖性,尽管有些发现,如幽门螺杆菌感染和糖尿病之间的关联,最近在文献中已有报道。然而,我们在大型队列中的时间分析表明,糖尿病通常先于 H pylori 的诊断,这引发了关于可能的因果关系的问题。
这种分析存在显著的局限性,其中一些是由于 ICD-9 代码的已知问题,另一些是由于即使在医疗系统层面也可能存在数据不完整的问题。尽管如此,具有时间建模的大规模关联分析可以帮助提供一种新的发现机制,以支持假设生成。
时间关系可以为识别和解释临床关联提供额外的意义。