Han Jiami, Masserey Solène, Shlesinger Danielle, Kuhn Raphael, Papadopoulou Chrysa, Agrafiotis Andreas, Kreiner Victor, Dizerens Raphael, Hong Kai-Lin, Weber Cédric, Greiff Victor, Oxenius Annette, Reddy Sai T, Yermanos Alexander
Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland.
Department of Immunology, University of Oslo, Oslo 0450, Norway.
Bioinform Adv. 2022 Sep 2;2(1):vbac062. doi: 10.1093/bioadv/vbac062. eCollection 2022.
Single-cell sequencing now enables the recovery of full-length immune receptor repertoires [B cell receptor (BCR) and T cell receptor (TCR) repertoires], in addition to gene expression information. The feature-rich datasets produced from such experiments require extensive and diverse computational analyses, each of which can significantly influence the downstream immunological interpretations, such as clonal selection and expansion. Simulations produce validated standard datasets, where the underlying generative model can be precisely defined and furthermore perturbed to investigate specific questions of interest. Currently, there is no tool that can be used to simulate single-cell datasets incorporating immune receptor repertoires and gene expression.
We developed Echidna, an R package that simulates immune receptors and transcriptomes at single-cell resolution with user-tunable parameters controlling a wide range of features such as clonal expansion, germline gene usage, somatic hypermutation, transcriptional phenotypes and spatial location. Echidna can additionally simulate time-resolved B cell evolution, producing mutational networks with complex selection histories incorporating class-switching and B cell subtype information. We demonstrated the benchmarking potential of Echidna by simulating clonal lineages and comparing the known simulated networks with those inferred from only the BCR sequences as input. Finally, we simulated immune repertoire information onto existing spatial transcriptomic experiments, thereby generating novel datasets that could be used to develop and integrate methods to profile clonal selection in a spatially resolved manner. Together, Echidna provides a framework that can incorporate experimental data to simulate single-cell immune repertoires to aid software development and bioinformatic benchmarking of clonotyping, phylogenetics, transcriptomics and machine learning strategies.
The R package and code used in this manuscript can be found at github.com/alexyermanos/echidna and also in the R package Platypus (Yermanos , 2021). Installation instructions and the vignette for Echidna is described in the Platypus Computational Ecosystem (https://alexyermanos.github.io/Platypus/index.html). Publicly available data and corresponding sample accession numbers can be found in Supplementary Tables S2 and S3.
Supplementary data are available at online.
单细胞测序现在不仅能够获取基因表达信息,还能恢复全长免疫受体库(B细胞受体和T细胞受体库)。此类实验产生的富含特征的数据集需要进行广泛且多样的计算分析,而每一种分析都可能对下游的免疫学解释产生重大影响,例如克隆选择和扩增。模拟可以生成经过验证的标准数据集,其中潜在的生成模型能够被精确界定,并且还可以进行扰动以研究感兴趣的特定问题。目前,尚无工具可用于模拟包含免疫受体库和基因表达的单细胞数据集。
我们开发了Echidna,这是一个R包,它能够以单细胞分辨率模拟免疫受体和转录组,其用户可调参数可控制广泛的特征,如克隆扩增、种系基因使用、体细胞超突变、转录表型和空间位置。Echidna还能够模拟时间分辨的B细胞进化,生成具有复杂选择历史的突变网络,其中包含类别转换和B细胞亚型信息。我们通过模拟克隆谱系,并将已知的模拟网络与仅以BCR序列作为输入推断出的网络进行比较,展示了Echidna的基准测试潜力。最后,我们将免疫库信息模拟到现有的空间转录组实验中,从而生成可用于开发和整合以空间分辨方式分析克隆选择的方法的新型数据集。总之,Echidna提供了一个框架,该框架可以纳入实验数据来模拟单细胞免疫库,以辅助软件开发以及对克隆分型、系统发育学、转录组学和机器学习策略进行生物信息学基准测试。
本手稿中使用的R包和代码可在github.com/alexyermanos/echidna上找到,也可在R包Platypus(Yermanos,2021)中找到。Echidna的安装说明和vignette在Platypus计算生态系统(https://alexyermanos.github.io/Platypus/index.html)中有描述。公开可用的数据及相应的样本登录号可在补充表S2和S3中找到。
补充数据可在网上获取。