Amyloidosis Center, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, United States.
Research Computing Services, Boston University, Boston, MA, United States.
Front Immunol. 2023 Apr 18;14:1167235. doi: 10.3389/fimmu.2023.1167235. eCollection 2023.
Monoclonal antibody light chain proteins secreted by clonal plasma cells cause tissue damage due to amyloid deposition and other mechanisms. The unique protein sequence associated with each case contributes to the diversity of clinical features observed in patients. Extensive work has characterized many light chains associated with multiple myeloma, light chain amyloidosis and other disorders, which we have collected in the publicly accessible database, AL-Base. However, light chain sequence diversity makes it difficult to determine the contribution of specific amino acid changes to pathology. Sequences of light chains associated with multiple myeloma provide a useful comparison to study mechanisms of light chain aggregation, but relatively few monoclonal sequences have been determined. Therefore, we sought to identify complete light chain sequences from existing high throughput sequencing data.
We developed a computational approach using the MiXCR suite of tools to extract complete rearranged sequences from untargeted RNA sequencing data. This method was applied to whole-transcriptome RNA sequencing data from 766 newly diagnosed patients in the Multiple Myeloma Research Foundation CoMMpass study.
Monoclonal sequences were defined as those where >50% of assigned or reads from each sample mapped to a unique sequence. Clonal light chain sequences were identified in 705/766 samples from the CoMMpass study. Of these, 685 sequences covered the complete region. The identity of the assigned sequences is consistent with their associated clinical data and with partial sequences previously determined from the same cohort of samples. Sequences have been deposited in AL-Base.
Our method allows routine identification of clonal antibody sequences from RNA sequencing data collected for gene expression studies. The sequences identified represent, to our knowledge, the largest collection of multiple myeloma-associated light chains reported to date. This work substantially increases the number of monoclonal light chains known to be associated with non-amyloid plasma cell disorders and will facilitate studies of light chain pathology.
克隆浆细胞分泌的单克隆抗体轻链蛋白由于淀粉样沉积和其他机制导致组织损伤。与每个病例相关的独特蛋白质序列导致患者观察到的临床特征的多样性。广泛的工作已经描述了与多发性骨髓瘤、轻链淀粉样变性和其他疾病相关的许多轻链,我们已经将其收集在公共可访问的数据库 AL-Base 中。然而,轻链序列的多样性使得确定特定氨基酸变化对病理学的贡献变得困难。与多发性骨髓瘤相关的轻链序列为研究轻链聚集的机制提供了有用的比较,但确定的单克隆序列相对较少。因此,我们试图从现有的高通量测序数据中识别完整的轻链序列。
我们使用 MiXCR 工具套件开发了一种计算方法,从非靶向 RNA 测序数据中提取完整重排的序列。该方法应用于多发性骨髓瘤研究基金会 CoMMpass 研究中 766 例新诊断患者的全转录组 RNA 测序数据。
单克隆序列被定义为每个样本中>50%的分配或读取映射到唯一序列的序列。在 CoMMpass 研究中,从 766 个样本中鉴定出克隆轻链序列。其中,685 个序列覆盖了完整的区域。分配序列的身份与其相关的临床数据以及从同一队列的样本中先前确定的部分序列一致。序列已存入 AL-Base。
我们的方法允许从用于基因表达研究的 RNA 测序数据中常规识别克隆抗体序列。所鉴定的序列代表,据我们所知,迄今为止报告的与多发性骨髓瘤相关的轻链的最大集合。这项工作大大增加了已知与非淀粉样浆细胞疾病相关的单克隆轻链的数量,并将促进轻链病理学的研究。