IBM Research Europe, Rüschlikon, Switzerland.
Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland.
Front Immunol. 2023 Apr 17;14:1123968. doi: 10.3389/fimmu.2023.1123968. eCollection 2023.
The adaptive immune system has the extraordinary ability to produce a broad range of immunoglobulins that can bind a wide variety of antigens. During adaptive immune responses, activated B cells duplicate and undergo somatic hypermutation in their B-cell receptor (BCR) genes, resulting in clonal families of diversified B cells that can be related back to a common ancestor. Advances in high-throughput sequencing technologies have enabled the high-throughput characterization of B-cell repertoires, however, the accurate identification of clonally related BCR sequences remains a major challenge. In this study, we compare three different clone identification methods on both simulated and experimental data, and investigate their impact on the characterization of B-cell diversity. We observe that different methods lead to different clonal definitions, which affects the quantification of clonal diversity in repertoire data. Our analyses show that direct comparisons between clonal clusterings and clonal diversity of different repertoires should be avoided if different clone identification methods were used to define the clones. Despite this variability, the diversity indices inferred from the repertoires' clonal characterization across samples show similar patterns of variation regardless of the clonal identification method used. We find the Shannon entropy to be the most robust in terms of the variability of diversity rank across samples. Our analysis also suggests that the traditional germline gene alignment-based method for clonal identification remains the most accurate when the complete information about the sequence is known, but that alignment-free methods may be preferred for shorter sequencing read lengths. We make our implementation freely available as a Python library cdiversity.
适应性免疫系统具有产生广泛的免疫球蛋白的非凡能力,这些免疫球蛋白可以结合各种各样的抗原。在适应性免疫反应中,激活的 B 细胞在其 B 细胞受体 (BCR) 基因中复制并经历体细胞超突变,导致多样化的 B 细胞克隆家族,可以追溯到一个共同的祖先。高通量测序技术的进步使 B 细胞库的高通量特征得以实现,然而,准确识别克隆相关的 BCR 序列仍然是一个主要挑战。在这项研究中,我们比较了三种不同的克隆识别方法在模拟和实验数据上的表现,并研究了它们对 B 细胞多样性特征的影响。我们观察到不同的方法导致不同的克隆定义,这会影响库数据中克隆多样性的定量。我们的分析表明,如果使用不同的克隆识别方法来定义克隆,则应避免对不同的克隆聚类和不同库的克隆多样性进行直接比较。尽管存在这种可变性,但无论使用哪种克隆识别方法,从库的克隆特征推断出的多样性指数在不同样本之间的变化模式都相似。我们发现 Shannon 熵在样本间多样性等级的可变性方面最为稳健。我们的分析还表明,当完全了解序列的完整信息时,基于种系基因比对的传统克隆识别方法仍然是最准确的,但在较短的测序读长下,无比对方法可能更受欢迎。我们以 Python 库 cdiversity 的形式免费提供我们的实现。