Suppr超能文献

大数据的视觉模式驱动探索

Visual Pattern-Driven Exploration of Big Data.

作者信息

Behrisch Michael, Schreck Tobias, Krüger Robert, Gehlenborg Nils, Lekschas Fritz, Pfister Hanspeter

机构信息

Harvard University, Cambridge, USA.

Graz University of Technology, Graz, Austria.

出版信息

2018 Int Symp Big Data Vis Immers Analyt (BDVA) (2018). 2018 Oct;2018. doi: 10.1109/BDVA.2018.8534028. Epub 2018 Nov 15.

Abstract

Pattern extraction algorithms are enabling insights into the ever-growing amount of today's datasets by translating reoccurring data properties into compact representations. Yet, a practical problem arises: With increasing data volumes and complexity also the number of patterns increases, leaving the analyst with a vast result space. Current algorithmic and especially visualization approaches often fail to answer central overview questions essential for a comprehensive understanding of pattern distributions and support, their quality, and relevance to the analysis task. To address these challenges, we contribute a visual analytics pipeline targeted on the pattern-driven exploration of result spaces in a semi-automatic fashion. Specifically, we combine image feature analysis and unsupervised learning to partition the pattern space into interpretable, coherent chunks, which should be given priority in a subsequent in-depth analysis. In our analysis scenarios, no ground-truth is given. Thus, we employ and evaluate novel quality metrics derived from the distance distributions of our image feature vectors and the derived cluster model to guide the feature selection process. We visualize our results interactively, allowing the user to drill down from overview to detail into the pattern space and demonstrate our techniques in two case studies on Earth observation and biomedical genomic data.

摘要

模式提取算法通过将反复出现的数据属性转换为紧凑表示,从而能够洞察当今不断增长的数据集。然而,一个实际问题出现了:随着数据量和复杂性的增加,模式的数量也在增加,这给分析师留下了一个庞大的结果空间。当前的算法,尤其是可视化方法,往往无法回答对于全面理解模式分布及其支持、质量以及与分析任务的相关性至关重要的核心概述问题。为应对这些挑战,我们贡献了一个可视化分析管道,旨在以半自动方式对结果空间进行模式驱动的探索。具体而言,我们将图像特征分析和无监督学习相结合,将模式空间划分为可解释的、连贯的块,这些块应在后续的深入分析中优先考虑。在我们的分析场景中,没有给出真实情况。因此,我们采用并评估了从图像特征向量的距离分布和派生的聚类模型中得出的新质量指标,以指导特征选择过程。我们以交互方式可视化我们的结果,允许用户从概述深入到细节进入模式空间,并在地球观测和生物医学基因组数据的两个案例研究中展示我们的技术。

相似文献

1
Visual Pattern-Driven Exploration of Big Data.大数据的视觉模式驱动探索
2018 Int Symp Big Data Vis Immers Analyt (BDVA) (2018). 2018 Oct;2018. doi: 10.1109/BDVA.2018.8534028. Epub 2018 Nov 15.
3
Towards human-computer synergetic analysis of large-scale biological data.迈向大规模生物数据的人机协同分析。
BMC Bioinformatics. 2013;14 Suppl 14(Suppl 14):S10. doi: 10.1186/1471-2105-14-S14-S10. Epub 2013 Oct 9.
4
MOUNTAINEER: Topology-Driven Visual Analytics for Comparing Local Explanations.
IEEE Trans Vis Comput Graph. 2024 Jun 24;PP. doi: 10.1109/TVCG.2024.3418653.
6
Visual Analytics for Temporal Hypergraph Model Exploration.用于时态超图模型探索的可视化分析
IEEE Trans Vis Comput Graph. 2021 Feb;27(2):550-560. doi: 10.1109/TVCG.2020.3030408. Epub 2021 Jan 28.

本文引用的文献

1
Clustering algorithms: A comparative approach.聚类算法:一种比较方法。
PLoS One. 2019 Jan 15;14(1):e0210236. doi: 10.1371/journal.pone.0210236. eCollection 2019.
3
CellProfiler 3.0: Next-generation image processing for biology.CellProfiler 3.0:生物学的下一代图像处理。
PLoS Biol. 2018 Jul 3;16(7):e2005970. doi: 10.1371/journal.pbio.2005970. eCollection 2018 Jul.
4
Structural variation in the 3D genome.三维基因组的结构变异。
Nat Rev Genet. 2018 Jul;19(7):453-467. doi: 10.1038/s41576-018-0007-0.
6
The 4D nucleome project.4D核基因组计划。
Nature. 2017 Sep 13;549(7671):219-226. doi: 10.1038/nature23884.
8
Comparison of computational methods for Hi-C data analysis.用于Hi-C数据分析的计算方法比较。
Nat Methods. 2017 Jul;14(7):679-685. doi: 10.1038/nmeth.4325. Epub 2017 Jun 12.
9
Nucleome Analysis Reveals Structure-Function Relationships for Colon Cancer.细胞核组分析揭示结肠癌的结构-功能关系。
Mol Cancer Res. 2017 Jul;15(7):821-830. doi: 10.1158/1541-7786.MCR-16-0374. Epub 2017 Mar 3.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验