Singapore Immunology Network, Agency for Science Technology and Research, Singapore.
Department of Paediatrics, National University of Singapore, Singapore.
Bioinformatics. 2019 Jan 15;35(2):301-308. doi: 10.1093/bioinformatics/bty491.
Recent flow and mass cytometers generate datasets of dimensions 20 to 40 and a million single cells. From these, many tools facilitate the discovery of new cell populations associated with diseases or physiology. These new cell populations require the identification of new gating strategies, but gating strategies become exponentially more difficult to optimize when dimensionality increases. To facilitate this step, we developed Hypergate, an algorithm which given a cell population of interest identifies a gating strategy optimized for high yield and purity.
Hypergate achieves higher yield and purity than human experts, Support Vector Machines and Random-Forests on public datasets. We use it to revisit some established gating strategies for the identification of innate lymphoid cells, which identifies concise and efficient strategies that allow gating these cells with fewer parameters but higher yield and purity than the current standards. For phenotypic description, Hypergate's outputs are consistent with fields' knowledge and sparser than those from a competing method.
Hypergate is implemented in R and available on CRAN. The source code is published at http://github.com/ebecht/hypergate under an Open Source Initiative-compliant licence.
Supplementary data are available at Bioinformatics online.
最近的流式和质谱细胞仪生成了维度为 20 到 40 和一百万单个细胞的数据集。从这些数据集中,许多工具促进了与疾病或生理学相关的新细胞群体的发现。这些新的细胞群体需要确定新的门控策略,但当维度增加时,门控策略的优化变得更加困难。为了促进这一步,我们开发了 Hypergate,这是一种算法,它给定一个感兴趣的细胞群体,确定一种优化的门控策略,以实现高产率和高纯度。
在公共数据集上,Hypergate 在产量和纯度方面优于人类专家、支持向量机和随机森林。我们用它来重新审视一些已建立的固有淋巴细胞识别的门控策略,这些策略确定了简洁和有效的策略,允许用更少的参数来门控这些细胞,但产量和纯度比当前标准更高。对于表型描述,Hypergate 的输出与领域知识一致,比竞争方法的输出更稀疏。
Hypergate 是用 R 语言实现的,并在 CRAN 上可用。源代码在遵守开源倡议的许可证下发布在 http://github.com/ebecht/hypergate 上。
补充数据可在 Bioinformatics 在线获得。