Suppr超能文献

optimalFlow:流式细胞术门控和群体匹配的最优传输方法。

optimalFlow: optimal transport approach to flow cytometry gating and population matching.

机构信息

Departamento de Estadística e Investigación Operativa, Universidad de Valladolid, Calle Paseo de Belén, Valladolid, Spain.

IMUVA, Calle Paseo de Belén, Valladolid, Spain.

出版信息

BMC Bioinformatics. 2020 Oct 27;21(1):479. doi: 10.1186/s12859-020-03795-w.

Abstract

BACKGROUND

Data obtained from flow cytometry present pronounced variability due to biological and technical reasons. Biological variability is a well-known phenomenon produced by measurements on different individuals, with different characteristics such as illness, age, sex, etc. The use of different settings for measurement, the variation of the conditions during experiments and the different types of flow cytometers are some of the technical causes of variability. This mixture of sources of variability makes the use of supervised machine learning for identification of cell populations difficult. The present work is conceived as a combination of strategies to facilitate the task of supervised gating.

RESULTS

We propose optimalFlowTemplates, based on a similarity distance and Wasserstein barycenters, which clusters cytometries and produces prototype cytometries for the different groups. We show that supervised learning, restricted to the new groups, performs better than the same techniques applied to the whole collection. We also present optimalFlowClassification, which uses a database of gated cytometries and optimalFlowTemplates to assign cell types to a new cytometry. We show that this procedure can outperform state of the art techniques in the proposed datasets. Our code is freely available as optimalFlow, a Bioconductor R package at https://bioconductor.org/packages/optimalFlow .

CONCLUSIONS

optimalFlowTemplates + optimalFlowClassification addresses the problem of using supervised learning while accounting for biological and technical variability. Our methodology provides a robust automated gating workflow that handles the intrinsic variability of flow cytometry data well. Our main innovation is the methodology itself and the optimal transport techniques that we apply to flow cytometry analysis.

摘要

背景

流式细胞术获得的数据由于生物学和技术原因而呈现出明显的可变性。生物学变异性是一种众所周知的现象,是由不同个体的测量产生的,这些个体具有不同的特征,如疾病、年龄、性别等。测量时使用不同的设置、实验过程中条件的变化以及不同类型的流式细胞仪是造成可变性的一些技术原因。这种混合的变异性来源使得使用有监督的机器学习来识别细胞群体变得困难。本工作是结合了一些策略,以方便有监督的门控任务。

结果

我们提出了最优 FlowTemplates,它基于相似性距离和 Wasserstein 重心,对流式细胞仪进行聚类,并为不同的组产生原型流式细胞仪。我们表明,受监督的学习仅限于新的组,比应用于整个集合的相同技术表现得更好。我们还提出了最优 FlowClassification,它使用门控流式细胞仪数据库和最优 FlowTemplates 来将细胞类型分配给新的流式细胞仪。我们表明,该过程可以在提出的数据集上优于最先进的技术。我们的代码可以在 https://bioconductor.org/packages/optimalFlow 作为 Bioconductor R 包中的 optimalFlow 免费获得。

结论

optimalFlowTemplates + optimalFlowClassification 解决了在考虑生物学和技术可变性的情况下使用有监督学习的问题。我们的方法提供了一种稳健的自动化门控工作流程,很好地处理了流式细胞术数据的固有可变性。我们的主要创新是我们应用于流式细胞术分析的方法本身和最优传输技术。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d5ff/7590740/ded7a161b256/12859_2020_3795_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验