概率区间比较：一种用于量化单变量分布差异的指标。

Probability binning comparison: a metric for quantitating univariate distribution differences.

作者信息

Roederer M, Treister A, Moore W, Herzenberg L A

机构信息

Vaccine Research Center, NIH, Bethesda, Maryland 20892-3015, USA.

出版信息

Cytometry. 2001 Sep 1;45(1):37-46. doi: 10.1002/1097-0320(20010901)45:1<37::aid-cyto1142>3.0.co;2-e.

DOI:10.1002/1097-0320(20010901)45:1<37::aid-cyto1142>3.0.co;2-e

PMID:11598945

Abstract

BACKGROUND

Comparing distributions of data is an important goal in many applications. For example, determining whether two samples (e.g., a control and test sample) are statistically significantly different is useful to detect a response, or to provide feedback regarding instrument stability by detecting when collected data varies significantly over time.

METHODS

We apply a variant of the chi-squared statistic to comparing univariate distributions. In this variant, a control distribution is divided such that an equal number of events fall into each of the divisions, or bins. This approach is thereby a mini-max algorithm, in that it minimizes the maximum expected variance for the control distribution. The control-derived bins are then applied to test sample distributions, and a normalized chi-squared value is computed. We term this algorithm Probability Binning.

RESULTS

Using a Monte-Carlo simulation, we determined the distribution of chi-squared values obtained by comparing sets of events derived from the same distribution. Based on this distribution, we derive a conversion of any given chi-squared value into a metric that is analogous to a t-score, i.e., it can be used to estimate the probability that a test distribution is different from a control distribution. We demonstrate that this metric scales with the difference between two distributions, and can be used to rank samples according to similarity to a control. Finally, we demonstrate the applicability of this metric to ranking immunophenotyping distributions to suggest that it indeed can be used to objectively determine the relative distance of distributions compared to a single control.

CONCLUSION

Probability Binning, as shown here, provides a useful metric for determining the probability that two or more flow cytometric data distributions are different. This metric can also be used to rank distributions to identify which are most similar or dissimilar. In addition, the algorithm can be used to quantitate contamination of even highly-overlapping populations. Finally, as demonstrated in an accompanying paper, Probability Binning can be used to gate on events that represent significantly different subsets from a control sample. Published 2001 Wiley-Liss, Inc.

摘要

背景

在许多应用中，比较数据分布是一个重要目标。例如，确定两个样本（如对照样本和测试样本）在统计学上是否存在显著差异，对于检测反应或通过检测收集的数据随时间的显著变化来提供有关仪器稳定性的反馈很有用。

方法

我们应用卡方统计量的一种变体来比较单变量分布。在这种变体中，将对照分布进行划分，使得每个划分（或区间）中的事件数量相等。因此，这种方法是一种最小 - 最大算法，因为它使对照分布的最大预期方差最小化。然后将从对照得出的区间应用于测试样本分布，并计算归一化卡方值。我们将此算法称为概率区间划分。

结果

通过蒙特卡罗模拟，我们确定了通过比较来自相同分布的事件集所获得的卡方值的分布。基于此分布，我们将任何给定的卡方值转换为类似于t分数的度量，即它可用于估计测试分布与对照分布不同的概率。我们证明此度量随两个分布之间的差异而变化，并且可用于根据与对照的相似性对样本进行排名。最后，我们证明了此度量在对免疫表型分布进行排名方面的适用性，表明它确实可用于客观确定与单个对照相比分布的相对距离。

结论

如本文所示，概率区间划分提供了一个有用的度量，用于确定两个或多个流式细胞术数据分布不同的概率。此度量还可用于对分布进行排名，以识别哪些分布最相似或最不相似。此外，该算法可用于定量甚至高度重叠群体的污染。最后，如随附论文所示，概率区间划分可用于对代表与对照样本有显著差异的子集的事件进行设门。2001年由Wiley - Liss公司出版。

相似文献

Probability binning comparison: a metric for quantitating univariate distribution differences.

Cytometry. 2001 Sep 1;45(1):37-46. doi: 10.1002/1097-0320(20010901)45:1<37::aid-cyto1142>3.0.co;2-e.

Probability binning comparison: a metric for quantitating multivariate distribution differences.

Cytometry. 2001 Sep 1;45(1):47-55. doi: 10.1002/1097-0320(20010901)45:1<47::aid-cyto1143>3.0.co;2-a.

Frequency difference gating: a multivariate method for identifying subsets that differ between samples.

Cytometry. 2001 Sep 1;45(1):56-64. doi: 10.1002/1097-0320(20010901)45:1<56::aid-cyto1144>3.0.co;2-9.

Probability binning and testing agreement between multivariate immunofluorescence histograms: extending the chi-squared test.

Cytometry. 2001 Oct 1;45(2):141-50. doi: 10.1002/1097-0320(20011001)45:2<141::aid-cyto1156>3.0.co;2-m.

Quadratic form: a robust metric for quantitative comparison of flow cytometric histograms.

Cytometry A. 2008 Aug;73(8):715-26. doi: 10.1002/cyto.a.20586.

Multiway contingency tables: Monte Carlo resampling probability values for the chi-squared and likelihood-ratio tests.

Psychol Rep. 2010 Oct;107(2):501-10. doi: 10.2466/03.PR0.107.5.501-510.

A voxel-dose algorithm of heterogeneous activity distribution for Monte-Carlo simulation of radionuclide therapy dosimetry.

Cancer Biother Radiopharm. 2012 Aug;27(6):344-52. doi: 10.1089/cbr.2012.1173.

Calculating the probability of random sampling for continuous variables in submitted or published randomised controlled trials.

Anaesthesia. 2015 Jul;70(7):848-58. doi: 10.1111/anae.13126. Epub 2015 May 29.

Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure.

Cytometry A. 2016 Jan;89(1):71-88. doi: 10.1002/cyto.a.22735. Epub 2015 Aug 14.

Profiling of polychromatic flow cytometry data on B-cells reveals patients' clusters in common variable immunodeficiency.

Cytometry A. 2009 Nov;75(11):902-9. doi: 10.1002/cyto.a.20801.

引用本文的文献

Investigating the Cell Entry Mechanism, Disassembly, and Toxicity of the Nanocage PCC-1: Insights into Its Potential as a Drug Delivery Vehicle.

J Am Chem Soc. 2023 Dec 20;145(50):27690-27701. doi: 10.1021/jacs.3c09918. Epub 2023 Dec 9.

Deciphering variations in the endocytic uptake of a cell-penetrating peptide: the crucial role of cell culture protocols.

Cytotechnology. 2023 Dec;75(6):473-490. doi: 10.1007/s10616-023-00591-1. Epub 2023 Sep 8.

Lactate induces PD-L1 in HRAS-positive oropharyngeal squamous cell carcinoma.

Oncotarget. 2020 Apr 28;11(17):1493-1504. doi: 10.18632/oncotarget.27348.

Human microglia regional heterogeneity and phenotypes determined by multiplexed single-cell mass cytometry.

Nat Neurosci. 2019 Jan;22(1):78-90. doi: 10.1038/s41593-018-0290-2. Epub 2018 Dec 17.

FAST: Rapid determinations of antibiotic susceptibility phenotypes using label-free cytometry.

Cytometry A. 2018 Jun;93(6):639-648. doi: 10.1002/cyto.a.23370. Epub 2018 May 7.

Characterization of Diffusion Metric Map Similarity in Data From a Clinical Data Repository Using Histogram Distances.

Front Neurosci. 2018 Mar 8;12:133. doi: 10.3389/fnins.2018.00133. eCollection 2018.

Human group 2 innate lymphoid cells do not express the IL-5 receptor.

J Allergy Clin Immunol. 2017 Nov;140(5):1430-1433.e4. doi: 10.1016/j.jaci.2017.04.025. Epub 2017 May 10.

A benchmark for evaluation of algorithms for identification of cellular correlates of clinical outcomes.

Cytometry A. 2016 Jan;89(1):16-21. doi: 10.1002/cyto.a.22732. Epub 2015 Oct 8.

Rapid cytometric antibiotic susceptibility testing utilizing adaptive multidimensional statistical metrics.

Anal Chem. 2015 Feb 3;87(3):1941-9. doi: 10.1021/ac504241x. Epub 2015 Jan 13.

Suppression of Foxo1 activity and down-modulation of CD62L (L-selectin) in HIV-1 infected resting CD4 T cells.

PLoS One. 2014 Oct 16;9(10):e110719. doi: 10.1371/journal.pone.0110719. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

概率区间比较：一种用于量化单变量分布差异的指标。

Probability binning comparison: a metric for quantitating univariate distribution differences.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献