Research Program in Systems Oncology, Research Programs Unit, Faculty of Medicine, University of Helsinki, 00014 Helsinki, Finland.
Research Center for Cancer, Infections and Immunity, Institute of Biomedicine, University of Turku, Turku 20014, Finland.
Bioinformatics. 2020 Dec 22;36(20):5086-5092. doi: 10.1093/bioinformatics/btaa637.
Non-parametric dimensionality reduction techniques, such as t-distributed stochastic neighbor embedding (t-SNE), are the most frequently used methods in the exploratory analysis of single-cell datasets. Current implementations scale poorly to massive datasets and often require downsampling or interpolative approximations, which can leave less-frequent populations undiscovered and much information unexploited.
We implemented a fast t-SNE package, qSNE, which uses a quasi-Newton optimizer, allowing quadratic convergence rate and automatic perplexity (level of detail) optimizer. Our results show that these improvements make qSNE significantly faster than regular t-SNE packages and enables full analysis of large datasets, such as mass cytometry data, without downsampling.
Source code and documentation are openly available at https://bitbucket.org/anthakki/qsne/.
Supplementary data are available at Bioinformatics online.
非参数降维技术,如 t 分布随机邻域嵌入(t-SNE),是单细胞数据集探索性分析中最常用的方法。当前的实现方法在大规模数据集上的扩展效果不佳,通常需要下采样或插值近似,这可能会导致较少出现的群体未被发现,并且大量信息未被利用。
我们实现了一个快速的 t-SNE 包 qSNE,它使用拟牛顿优化器,允许二次收敛速度和自动困惑度(详细程度)优化器。我们的结果表明,这些改进使得 qSNE 明显快于常规的 t-SNE 包,并能够对大型数据集(如质谱细胞术数据)进行完整分析,而无需下采样。
源代码和文档可在 https://bitbucket.org/anthakki/qsne/ 上公开获取。
补充数据可在生物信息学在线获得。