Department of Biosystems Science and Engineering, ETH Zürich, 4058 Basel, Switzerland.
Department of Immunology, University of Oslo, 0372 Oslo, Norway.
Bioinformatics. 2020 Jun 1;36(11):3594-3596. doi: 10.1093/bioinformatics/btaa158.
B- and T-cell receptor repertoires of the adaptive immune system have become a key target for diagnostics and therapeutics research. Consequently, there is a rapidly growing number of bioinformatics tools for immune repertoire analysis. Benchmarking of such tools is crucial for ensuring reproducible and generalizable computational analyses. Currently, however, it remains challenging to create standardized ground truth immune receptor repertoires for immunoinformatics tool benchmarking. Therefore, we developed immuneSIM, an R package that allows the simulation of native-like and aberrant synthetic full-length variable region immune receptor sequences by tuning the following immune receptor features: (i) species and chain type (BCR, TCR, single and paired), (ii) germline gene usage, (iii) occurrence of insertions and deletions, (iv) clonal abundance, (v) somatic hypermutation and (vi) sequence motifs. Each simulated sequence is annotated by the complete set of simulation events that contributed to its in silico generation. immuneSIM permits the benchmarking of key computational tools for immune receptor analysis, such as germline gene annotation, diversity and overlap estimation, sequence similarity, network architecture, clustering analysis and machine learning methods for motif detection.
The package is available via https://github.com/GreiffLab/immuneSIM and on CRAN at https://cran.r-project.org/web/packages/immuneSIM. The documentation is hosted at https://immuneSIM.readthedocs.io.
sai.reddy@ethz.ch or victor.greiff@medisin.uio.no.
Supplementary data are available at Bioinformatics online.
适应性免疫系统的 B 细胞和 T 细胞受体库已成为诊断和治疗研究的一个关键目标。因此,用于免疫受体分析的生物信息学工具数量迅速增加。此类工具的基准测试对于确保可重复和可推广的计算分析至关重要。然而,目前为免疫信息学工具基准测试创建标准化的真实免疫受体库仍然具有挑战性。因此,我们开发了 immuneSIM,这是一个 R 包,它允许通过调整以下免疫受体特征来模拟天然和异常合成全长可变区免疫受体序列:(i)物种和链类型(BCR、TCR、单链和双链),(ii)胚系基因使用,(iii)插入和缺失的发生,(iv)克隆丰度,(v)体细胞超突变和(vi)序列基序。每个模拟序列都由有助于其在计算机中生成的完整模拟事件集进行注释。immuneSIM 允许对关键的免疫受体分析计算工具进行基准测试,例如胚系基因注释、多样性和重叠估计、序列相似性、网络架构、聚类分析和用于基序检测的机器学习方法。
该软件包可通过 https://github.com/GreiffLab/immuneSIM 获得,也可在 CRAN 上的 https://cran.r-project.org/web/packages/immuneSIM 获得。文档托管在 https://immuneSIM.readthedocs.io。
sai.reddy@ethz.ch 或 victor.greiff@medisin.uio.no。
补充数据可在生物信息学在线获得。