Suppr超能文献

U-CIE:高维数据的颜色编码。

U-CIE [/juː 'siː/]: Color encoding of high-dimensional data.

机构信息

Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

Resource on Biocomputing, Visualization, and Informatics, University of California, California.

出版信息

Protein Sci. 2022 Sep;31(9):e4388. doi: 10.1002/pro.4388.

Abstract

Data visualization is essential to discover patterns and anomalies in large high-dimensional datasets. New dimensionality reduction techniques have thus been developed for visualizing omics data, in particular from single-cell studies. However, jointly showing several types of data, for example, single-cell expression and gene networks, remains a challenge. Here, we present 'U-CIE, a visualization method that encodes arbitrary high-dimensional data as colors using a combination of dimensionality reduction and the CIELAB color space to retain the original structure to the extent possible. U-CIE first uses UMAP to reduce high-dimensional data to three dimensions, partially preserving distances between entities. Next, it embeds the resulting three-dimensional representation within the CIELAB color space. This color model was designed to be perceptually uniform, meaning that the Euclidean distance between any two points should correspond to their relative perceptual difference. Therefore, the combination of UMAP and CIELAB thus results in a color encoding that captures much of the structure of the original high-dimensional data. We illustrate its broad applicability by visualizing single-cell data on a protein network and metagenomic data on a world map and on scatter plots.

摘要

数据可视化对于在大型高维数据集中发现模式和异常非常重要。因此,已经开发了新的降维技术来可视化组学数据,特别是来自单细胞研究的数据。然而,联合显示多种类型的数据,例如单细胞表达和基因网络,仍然是一个挑战。在这里,我们提出了“U-CIE”,这是一种可视化方法,它使用降维和 CIELAB 颜色空间的组合将任意高维数据编码为颜色,以尽可能保留原始结构。U-CIE 首先使用 UMAP 将高维数据降低到三维,部分保留实体之间的距离。接下来,它将得到的三维表示嵌入到 CIELAB 颜色空间中。这个颜色模型被设计为具有感知均匀性,这意味着任何两个点之间的欧几里得距离应该对应于它们的相对感知差异。因此,UMAP 和 CIELAB 的组合导致了一种颜色编码,它捕捉了原始高维数据的大部分结构。我们通过在蛋白质网络上可视化单细胞数据,在世界地图和散点图上可视化宏基因组数据,来说明其广泛的适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3223/9387205/c0ff310270d3/PRO-31-e4388-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验