Suppr超能文献

霍珀:一种用于生物数据草图绘制的数学最优算法。

Hopper: a mathematically optimal algorithm for sketching biological data.

机构信息

Department of Bioinformatics, Harvard University, Cambridge, MA 02138, USA.

Computer Science and Artificial Intelligence Laboratory.

出版信息

Bioinformatics. 2020 Jul 1;36(Suppl_1):i236-i241. doi: 10.1093/bioinformatics/btaa408.

Abstract

MOTIVATION

Single-cell RNA-sequencing has grown massively in scale since its inception, presenting substantial analytic and computational challenges. Even simple downstream analyses, such as dimensionality reduction and clustering, require days of runtime and hundreds of gigabytes of memory for today's largest datasets. In addition, current methods often favor common cell types, and miss salient biological features captured by small cell populations.

RESULTS

Here we present Hopper, a single-cell toolkit that both speeds up the analysis of single-cell datasets and highlights their transcriptional diversity by intelligent subsampling, or sketching. Hopper realizes the optimal polynomial-time approximation of the Hausdorff distance between the full and downsampled dataset, ensuring that each cell is well-represented by some cell in the sample. Unlike prior sketching methods, Hopper adds points iteratively and allows for additional sampling from regions of interest, enabling fast and targeted multi-resolution analyses. In a dataset of over 1.3 million mouse brain cells, Hopper detects a cluster of just 64 macrophages expressing inflammatory genes (0.004% of the full dataset) from a Hopper sketch containing just 5000 cells, and several other small but biologically interesting immune cell populations invisible to analysis of the full data. On an even larger dataset consisting of ∼2 million developing mouse organ cells, we show Hopper's even representation of important cell types in small sketches, in contrast with prior sketching methods. We also introduce Treehopper, which uses spatial partitioning to speed up Hopper by orders of magnitude with minimal loss in performance. By condensing transcriptional information encoded in large datasets, Hopper and Treehopper grant the individual user with a laptop the analytic capabilities of a large consortium.

AVAILABILITY AND IMPLEMENTATION

The code for Hopper is available at https://github.com/bendemeo/hopper. In addition, we have provided sketches of many of the largest single-cell datasets, available at http://hopper.csail.mit.edu.

摘要

动机

单细胞 RNA 测序自诞生以来已经大规模发展,带来了大量分析和计算方面的挑战。即使是简单的下游分析,如降维和聚类,也需要数天的运行时间和数百千兆字节的内存来处理当今最大的数据集。此外,当前的方法通常偏向常见的细胞类型,而忽略了由小细胞群体捕获的显著生物学特征。

结果

在这里,我们提出了 Hopper,这是一个单细胞工具包,通过智能抽样或草图,既加快了单细胞数据集的分析速度,又突出了其转录多样性。Hopper 实现了全数据集和下采样数据集之间 Hausdorff 距离的最优多项式时间逼近,确保每个细胞都由样本中的某个细胞很好地表示。与之前的草图方法不同,Hopper 会迭代地添加点,并允许从感兴趣的区域进行额外的采样,从而实现快速和有针对性的多分辨率分析。在一个超过 130 万只老鼠大脑细胞的数据集上,Hopper 从一个仅包含 5000 个细胞的 Hopper 草图中检测到仅 64 个巨噬细胞的一个簇,这些巨噬细胞表达炎症基因(占全数据集的 0.004%),而其他几个较小但具有生物学意义的免疫细胞群体在分析全数据时是不可见的。在一个由大约 200 万个发育中的老鼠器官细胞组成的更大的数据集上,我们展示了 Hopper 在小草图中对重要细胞类型的均匀表示,与之前的草图方法形成对比。我们还引入了 Treehopper,它使用空间分区以提高 Hopper 的速度,性能损失可以忽略不计。通过浓缩大型数据集编码的转录信息,Hopper 和 Treehopper 使个人用户拥有了大型联盟的分析能力。

可用性和实现

Hopper 的代码可在 https://github.com/bendemeo/hopper 上获得。此外,我们还提供了许多最大的单细胞数据集的草图,可在 http://hopper.csail.mit.edu 上获得。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验