Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT.
Integrated Graduate Program in Physical and Engineering Biology, Yale University, New Haven, CT.
J Immunol. 2024 May 15;212(10):1579-1588. doi: 10.4049/jimmunol.2300851.
Abs are vital to human immune responses and are composed of genetically variable H and L chains. These structures are initially expressed as BCRs. BCR diversity is shaped through somatic hypermutation and selection during immune responses. This evolutionary process produces B cell clones, cells that descend from a common ancestor but differ by mutations. Phylogenetic trees inferred from BCR sequences can reconstruct the history of mutations within a clone. Until recently, BCR sequencing technologies separated H and L chains, but advancements in single-cell sequencing now pair H and L chains from individual cells. However, it is unclear how these separate genes should be combined to infer B cell phylogenies. In this study, we investigated strategies for using paired H and L chain sequences to build phylogenetic trees. We found that incorporating L chains significantly improved tree accuracy and reproducibility across all methods tested. This improvement was greater than the difference between tree-building methods and persisted even when mixing bulk and single-cell sequencing data. However, we also found that many phylogenetic methods estimated significantly biased branch lengths when some L chains were missing, such as when mixing single-cell and bulk BCR data. This bias was eliminated using maximum likelihood methods with separate branch lengths for H and L chain gene partitions. Thus, we recommend using maximum likelihood methods with separate H and L chain partitions, especially when mixing data types. We implemented these methods in the R package Dowser: https://dowser.readthedocs.io.
抗体对于人体的免疫反应至关重要,由遗传上可变的 H 和 L 链组成。这些结构最初作为 BCR 表达。BCR 的多样性通过免疫反应中的体细胞超突变和选择来塑造。这个进化过程产生了 B 细胞克隆,这些细胞来自共同的祖先,但由于突变而有所不同。从 BCR 序列推断的系统发育树可以重建克隆内突变的历史。直到最近,BCR 测序技术还将 H 和 L 链分开,但单细胞测序的进步现在可以将单个细胞的 H 和 L 链配对。然而,目前还不清楚如何将这些单独的基因组合起来推断 B 细胞的系统发育。在这项研究中,我们研究了使用配对的 H 和 L 链序列构建系统发育树的策略。我们发现,结合 L 链可以显著提高所有测试方法的树准确性和可重复性。这种改进大于树构建方法之间的差异,即使混合批量和单细胞测序数据也是如此。然而,我们还发现,当某些 L 链缺失时,许多系统发育方法估计的分支长度存在显著偏差,例如混合单细胞和批量 BCR 数据时。通过为 H 和 L 链基因分区使用单独的分支长度的最大似然方法可以消除这种偏差。因此,我们建议在混合数据类型时使用具有单独 H 和 L 链分区的最大似然方法。我们在 R 包 Dowser 中实现了这些方法:https://dowser.readthedocs.io。