Suppr超能文献

概率区间比较:一种用于量化多元分布差异的指标。

Probability binning comparison: a metric for quantitating multivariate distribution differences.

作者信息

Roederer M, Moore W, Treister A, Hardy R R, Herzenberg L A

机构信息

Vaccine Research Center, NIH, Bethesda, Maryland 20892-3015, USA.

出版信息

Cytometry. 2001 Sep 1;45(1):47-55. doi: 10.1002/1097-0320(20010901)45:1<47::aid-cyto1143>3.0.co;2-a.

Abstract

BACKGROUND

While several algorithms for the comparison of univariate distributions arising from flow cytometric analyses have been developed and studied for many years, algorithms for comparing multivariate distributions remain elusive. Such algorithms could be useful for comparing differences between samples based on several independent measurements, rather than differences based on any single measurement. It is conceivable that distributions could be completely distinct in multivariate space, but unresolvable in any combination of univariate histograms. Multivariate comparisons could also be useful for providing feedback about instrument stability, when only subtle changes in measurements are occurring.

METHODS

We apply a variant of Probability Binning, described in the accompanying article, to multidimensional data. In this approach, hyper-rectangles of n dimensions (where n is the number of measurements being compared) comprise the bins used for the chi-squared statistic. These hyper-dimensional bins are constructed such that the control sample has the same number of events in each bin; the bins are then applied to the test samples for chi-squared calculations.

RESULTS

Using a Monte-Carlo simulation, we determined the distribution of chi-squared values obtained by comparing sets of events from the same distribution; this distribution of chi-squared values was identical as for the univariate algorithm. Hence, the same formulae can be used to construct a metric, analogous to a t-score, that estimates the probability with which distributions are distinct. As for univariate comparisons, this metric scales with the difference between two distributions, and can be used to rank samples according to similarity to a control. We apply the algorithm to multivariate immunophenotyping data, and demonstrate that it can be used to discriminate distinct samples and to rank samples according to a biologically-meaningful difference.

CONCLUSION

Probability binning, as shown here, provides a useful metric for determining the probability with which two or more multivariate distributions represent distinct sets of data. The metric can be used to identify the similarity or dissimilarity of samples. Finally, as demonstrated in the accompanying paper, the algorithm can be used to gate on events in one sample that are different from a control sample, even if those events cannot be distinguished on the basis of any combination of univariate or bivariate displays. Published 2001 Wiley-Liss, Inc.

摘要

背景

尽管多年来已经开发并研究了几种用于比较流式细胞术分析中产生的单变量分布的算法,但用于比较多变量分布的算法仍然难以捉摸。此类算法对于基于多个独立测量来比较样本之间的差异可能很有用,而不是基于任何单个测量的差异。可以想象,分布在多变量空间中可能完全不同,但在任何单变量直方图组合中都无法分辨。当测量中仅发生细微变化时,多变量比较对于提供有关仪器稳定性的反馈也可能很有用。

方法

我们将随附文章中描述的概率分箱变体应用于多维数据。在这种方法中,n维超矩形(其中n是正在比较的测量数量)构成用于卡方统计量的箱。这些超维箱的构建方式是使对照样本在每个箱中具有相同数量的事件;然后将这些箱应用于测试样本以进行卡方计算。

结果

使用蒙特卡罗模拟,我们确定了通过比较来自相同分布的事件集获得的卡方值的分布;此卡方值分布与单变量算法的分布相同。因此,可以使用相同的公式来构建一个类似于t分数的度量,该度量估计分布不同的概率。与单变量比较一样,此度量随两个分布之间的差异而缩放,可用于根据与对照的相似性对样本进行排名。我们将该算法应用于多变量免疫表型数据,并证明它可用于区分不同的样本,并根据生物学上有意义的差异对样本进行排名。

结论

如此处所示,概率分箱为确定两个或多个多变量分布代表不同数据集的概率提供了一个有用的度量。该度量可用于识别样本的相似性或不相似性。最后,如随附论文中所示,该算法可用于在一个样本中对与对照样本不同的事件进行门控,即使这些事件无法根据任何单变量或双变量显示的组合来区分。2001年由Wiley-Liss公司出版。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验