IEEE Trans Vis Comput Graph. 2021 Feb;27(2):1481-1491. doi: 10.1109/TVCG.2020.3030455. Epub 2021 Jan 28.
The collection and visual analysis of large-scale data from complex systems, such as electronic health records or clickstream data, has become increasingly common across a wide range of industries. This type of retrospective visual analysis, however, is prone to a variety of selection bias effects, especially for high-dimensional data where only a subset of dimensions is visualized at any given time. The risk of selection bias is even higher when analysts dynamically apply filters or perform grouping operations during ad hoc analyses. These bias effects threaten the validity and generalizability of insights discovered during visual analysis as the basis for decision making. Past work has focused on bias transparency, helping users understand when selection bias may have occurred. However, countering the effects of selection bias via bias mitigation is typically left for the user to accomplish as a separate process. Dynamic reweighting (DR) is a novel computational approach to selection bias mitigation that helps users craft bias-corrected visualizations. This paper describes the DR workflow, introduces key DR visualization designs, and presents statistical methods that support the DR process. Use cases from the medical domain, as well as findings from domain expert user interviews, are also reported.
从电子健康记录或点击流数据等复杂系统中收集和可视化分析大规模数据,在许多行业中已经变得越来越普遍。然而,这种回溯式可视化分析容易受到各种选择偏差的影响,特别是在高维数据中,任何给定时间只能可视化一部分维度。当分析师在临时分析过程中动态应用过滤器或执行分组操作时,选择偏差的风险更高。这些偏差效应威胁到在视觉分析中发现的洞察力的有效性和普遍性,因为这些洞察力是决策的基础。过去的工作主要集中在偏差透明度上,帮助用户了解选择偏差何时可能发生。然而,通过偏差缓解来抵消选择偏差的影响通常留给用户作为一个单独的过程来完成。动态重新加权 (DR) 是一种用于选择偏差缓解的新型计算方法,可以帮助用户制作偏差校正的可视化。本文描述了 DR 工作流程,介绍了关键的 DR 可视化设计,并提出了支持 DR 过程的统计方法。还报告了来自医学领域的用例以及来自领域专家用户访谈的发现。