Litron Laboratories, Rochester, New York, USA.
Institute of Life Sciences, Swansea University, Swansea, UK.
Environ Mol Mutagen. 2024 Jun;65(5):156-178. doi: 10.1002/em.22604. Epub 2024 May 17.
This article describes a range of high-dimensional data visualization strategies that we have explored for their ability to complement machine learning algorithm predictions derived from MultiFlow® assay results. For this exercise, we focused on seven biomarker responses resulting from the exposure of TK6 cells to each of 126 diverse chemicals over a range of concentrations. Obviously, challenges associated with visualizing seven biomarker responses were further complicated whenever there was a desire to represent the entire 126 chemical data set as opposed to results from a single chemical. Scatter plots, spider plots, parallel coordinate plots, hierarchical clustering, principal component analysis, toxicological prioritization index, multidimensional scaling, t-distributed stochastic neighbor embedding, and uniform manifold approximation and projection are each considered in turn. Our report provides a comparative analysis of these techniques. In an era where multiplexed assays and machine learning algorithms are becoming the norm, stakeholders should find some of these visualization strategies useful for efficiently and effectively interpreting their high-dimensional data.
本文描述了一系列高维数据可视化策略,我们探索了这些策略,以评估它们在补充机器学习算法预测方面的能力,这些预测是基于 MultiFlow® 分析结果得出的。在这项研究中,我们专注于七种生物标志物反应,这些反应是在一系列浓度下将 TK6 细胞暴露于 126 种不同化学物质的结果。显然,每当需要表示整个 126 个化学数据集而不是单个化学物质的结果时,可视化七种生物标志物反应所带来的挑战就更加复杂。散点图、蜘蛛图、平行坐标图、层次聚类、主成分分析、毒理学优先级指数、多维尺度分析、t 分布随机邻居嵌入和一致流形逼近和投影,依次对这些技术进行了考虑。我们的报告提供了对这些技术的比较分析。在一个多指标检测分析和机器学习算法变得越来越普遍的时代,利益相关者应该会发现其中一些可视化策略对于有效地解释他们的高维数据很有用。