Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
UK Dementia Research Institute, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK.
Nat Methods. 2021 Mar;18(3):262-271. doi: 10.1038/s41592-021-01076-9. Epub 2021 Mar 1.
Single-cell technologies have made it possible to profile millions of cells, but for these resources to be useful they must be easy to query and access. To facilitate interactive and intuitive access to single-cell data we have developed scfind, a single-cell analysis tool that facilitates fast search of biologically or clinically relevant marker genes in cell atlases. Using transcriptome data from six mouse cell atlases, we show how scfind can be used to evaluate marker genes, perform in silico gating, and identify both cell-type-specific and housekeeping genes. Moreover, we have developed a subquery optimization routine to ensure that long and complex queries return meaningful results. To make scfind more user friendly, we use indices of PubMed abstracts and techniques from natural language processing to allow for arbitrary queries. Finally, we show how scfind can be used for multi-omics analyses by combining single-cell ATAC-seq data with transcriptome data.
单细胞技术已经使得对数百万个细胞进行分析成为可能,但这些资源要想发挥作用,就必须易于查询和访问。为了方便对单细胞数据进行交互式和直观的访问,我们开发了 scfind,这是一种单细胞分析工具,可促进在细胞图谱中快速搜索具有生物学或临床意义的标记基因。我们使用来自六个小鼠细胞图谱的转录组数据,展示了 scfind 如何用于评估标记基因、进行虚拟门控,以及识别细胞类型特异性和管家基因。此外,我们开发了一种子查询优化例程,以确保长而复杂的查询返回有意义的结果。为了使 scfind 更易于使用,我们使用 PubMed 摘要索引和自然语言处理技术来支持任意查询。最后,我们展示了如何通过将单细胞 ATAC-seq 数据与转录组数据相结合,使用 scfind 进行多组学分析。