Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, United States of America.
Department of Biostatistics, Vanderbilt University, Nashville, Tennessee, United States of America.
PLoS One. 2019 Nov 27;14(11):e0225495. doi: 10.1371/journal.pone.0225495. eCollection 2019.
Increasing reliance on electronic medical records at large medical centers provides unique opportunities to perform population level analyses exploring disease progression and etiology. The massive accumulation of diagnostic, procedure, and laboratory codes in one place has enabled the exploration of co-occurring conditions, their risk factors, and potential prognostic factors. While most of the readily identifiable associations in medical records are (now) well known to the scientific community, there is no doubt many more relationships are still to be uncovered in EMR data. In this paper, we introduce a novel finding index to help with that task. This new index uses data mined from real-time PubMed abstracts to indicate the extent to which empirically discovered associations are already known (i.e., present in the scientific literature). Our methods leverage second-generation p-values, which better identify associations that are truly clinically meaningful. We illustrate our new method with three examples: Autism Spectrum Disorder, Alzheimer's Disease, and Optic Neuritis. Our results demonstrate wide utility for identifying new associations in EMR data that have the highest priority among the complex web of correlations and causalities. Data scientists and clinicians can work together more effectively to discover novel associations that are both empirically reliable and clinically understudied.
在大型医疗中心越来越依赖电子病历,为进行人群水平分析以探索疾病的进展和病因提供了独特的机会。大量的诊断、程序和实验室代码集中在一个地方,使得探索同时发生的情况、其危险因素和潜在预后因素成为可能。虽然大多数在病历中可识别的关联(现在)已经为科学界所熟知,但毫无疑问,在电子病历数据中仍有许多更多的关系有待发现。在本文中,我们引入了一种新的发现指数来帮助完成这项任务。这个新指数使用从实时 PubMed 摘要中挖掘的数据来表示已经发现的关联在多大程度上是已知的(即存在于科学文献中)。我们的方法利用第二代 p 值,更好地识别真正具有临床意义的关联。我们用三个例子来说明我们的新方法:自闭症谱系障碍、阿尔茨海默病和视神经炎。我们的结果表明,我们的新方法在识别电子病历数据中的新关联方面具有广泛的应用,这些关联在复杂的相关性和因果关系网络中具有最高的优先级。数据科学家和临床医生可以更有效地合作,发现既有经验可靠性又有临床研究不足的新关联。