Center for Data Sciences, Brigham and Women's Hospital, Boston, MA, USA.
Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
Nat Commun. 2021 Oct 7;12(1):5890. doi: 10.1038/s41467-021-25957-x.
Recent advances in single-cell technologies and integration algorithms make it possible to construct comprehensive reference atlases encompassing many donors, studies, disease states, and sequencing platforms. Much like mapping sequencing reads to a reference genome, it is essential to be able to map query cells onto complex, multimillion-cell reference atlases to rapidly identify relevant cell states and phenotypes. We present Symphony ( https://github.com/immunogenomics/symphony ), an algorithm for building large-scale, integrated reference atlases in a convenient, portable format that enables efficient query mapping within seconds. Symphony localizes query cells within a stable low-dimensional reference embedding, facilitating reproducible downstream transfer of reference-defined annotations to the query. We demonstrate the power of Symphony in multiple real-world datasets, including (1) mapping a multi-donor, multi-species query to predict pancreatic cell types, (2) localizing query cells along a developmental trajectory of fetal liver hematopoiesis, and (3) inferring surface protein expression with a multimodal CITE-seq atlas of memory T cells.
单细胞技术和整合算法的最新进展使得构建涵盖多个供体、研究、疾病状态和测序平台的综合参考图谱成为可能。就像将测序reads 映射到参考基因组一样,能够将查询细胞映射到复杂的、数百万个细胞的参考图谱上,以快速识别相关的细胞状态和表型是至关重要的。我们提出了 Symphony(https://github.com/immunogenomics/symphony),这是一种用于构建大规模、集成参考图谱的算法,以方便、可移植的格式提供高效的查询映射,在几秒钟内完成。Symphony 在稳定的低维参考嵌入中定位查询细胞,促进可重复的下游将参考定义的注释转移到查询中。我们在多个真实数据集上展示了 Symphony 的强大功能,包括(1)将多供体、多物种查询映射到预测胰腺细胞类型,(2)沿着胎儿肝脏造血的发育轨迹定位查询细胞,以及(3)通过记忆 T 细胞的多模态 CITE-seq 图谱推断表面蛋白表达。