Bellmann Louis, Wiederhold Alexander Johannes, Trübe Leona, Twerenbold Raphael, Ückert Frank, Gottfried Karl
Institute for Applied Medical Informatics, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
Department of Cardiology, University Heart & Vascular Center Hamburg, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
JMIR Med Inform. 2024 Jul 24;12:e49865. doi: 10.2196/49865.
Interpretability and intuitive visualization facilitate medical knowledge generation through big data. In addition, robustness to high-dimensional and missing data is a requirement for statistical approaches in the medical domain. A method tailored to the needs of physicians must meet all the abovementioned criteria.
This study aims to develop an accessible tool for visual data exploration without the need for programming knowledge, adjusting complex parameterizations, or handling missing data. We sought to use statistical analysis using the setting of disease and control cohorts familiar to clinical researchers. We aimed to guide the user by identifying and highlighting data patterns associated with disease and reveal relations between attributes within the data set.
We introduce the attribute association graph, a novel graph structure designed for visual data exploration using robust statistical metrics. The nodes capture frequencies of participant attributes in disease and control cohorts as well as deviations between groups. The edges represent conditional relations between attributes. The graph is visualized using the Neo4j (Neo4j, Inc) data platform and can be interactively explored without the need for technical knowledge. Nodes with high deviations between cohorts and edges of noticeable conditional relationship are highlighted to guide the user during the exploration. The graph is accompanied by a dashboard visualizing variable distributions. For evaluation, we applied the graph and dashboard to the Hamburg City Health Study data set, a large cohort study conducted in the city of Hamburg, Germany. All data structures can be accessed freely by researchers, physicians, and patients. In addition, we developed a user test conducted with physicians incorporating the System Usability Scale, individual questions, and user tasks.
We evaluated the attribute association graph and dashboard through an exemplary data analysis of participants with a general cardiovascular disease in the Hamburg City Health Study data set. All results extracted from the graph structure and dashboard are in accordance with findings from the literature, except for unusually low cholesterol levels in participants with cardiovascular disease, which could be induced by medication. In addition, 95% CIs of Pearson correlation coefficients were calculated for all associations identified during the data analysis, confirming the results. In addition, a user test with 10 physicians assessing the usability of the proposed methods was conducted. A System Usability Scale score of 70.5% and average successful task completion of 81.4% were reported.
The proposed attribute association graph and dashboard enable intuitive visual data exploration. They are robust to high-dimensional as well as missing data and require no parameterization. The usability for clinicians was confirmed via a user test, and the validity of the statistical results was confirmed by associations known from literature and standard statistical inference.
可解释性和直观可视化有助于通过大数据生成医学知识。此外,对高维和缺失数据的鲁棒性是医学领域统计方法的一项要求。一种针对医生需求定制的方法必须满足上述所有标准。
本研究旨在开发一种无需编程知识、调整复杂参数设置或处理缺失数据即可进行可视化数据探索的易用工具。我们试图使用临床研究人员熟悉的疾病和对照队列设置进行统计分析。我们旨在通过识别和突出与疾病相关的数据模式来引导用户,并揭示数据集中各属性之间的关系。
我们引入了属性关联图,这是一种使用鲁棒统计指标设计用于可视化数据探索的新型图结构。节点捕获疾病和对照队列中参与者属性的频率以及组间差异。边表示属性之间的条件关系。该图使用Neo4j(Neo4j公司)数据平台进行可视化,无需技术知识即可进行交互式探索。在探索过程中,会突出显示队列间差异较大的节点和具有明显条件关系的边,以引导用户。该图还配有一个可视化变量分布的仪表板。为了进行评估,我们将该图和仪表板应用于汉堡市健康研究数据集,这是在德国汉堡市进行的一项大型队列研究。研究人员、医生和患者均可免费访问所有数据结构。此外,我们开展了一项针对医生的用户测试,测试内容包括系统可用性量表、个别问题和用户任务。
我们通过对汉堡市健康研究数据集中患有一般心血管疾病的参与者进行示例性数据分析,对属性关联图和仪表板进行了评估。从图结构和仪表板中提取的所有结果均与文献中的发现一致,但心血管疾病参与者胆固醇水平异常低这一情况可能是由药物引起的。此外,对数据分析过程中识别出的所有关联计算了Pearson相关系数的95%置信区间,证实了结果。此外,还对10名医生进行了用户测试,以评估所提方法的可用性。报告的系统可用性量表得分为70.5%,平均任务完成成功率为81.4%。
所提出的属性关联图和仪表板能够实现直观的可视化数据探索。它们对高维和缺失数据具有鲁棒性,且无需参数设置。通过用户测试证实了其对临床医生的可用性,通过文献中已知的关联和标准统计推断证实了统计结果的有效性。