Walsh Ian, Choo Matthew S F, Chiin Sim Lyn, Mak Amelia, Tay Shi Jie, Rudd Pauline M, Yuansheng Yang, Choo Andre, Swan Ho Ying, Nguyen-Khuong Terry
Analytics Group, Bioprocessing Technology Institute - Agency for Science Technology and Research. Singapore 138668.
University College Dublin, Belfield, Dublin, Ireland.
Beilstein J Org Chem. 2020 Aug 27;16:2087-2099. doi: 10.3762/bjoc.16.176. eCollection 2020.
The accurate assessment of antibody glycosylation during bioprocessing requires the high-throughput generation of large amounts of glycomics data. This allows bioprocess engineers to identify critical process parameters that control the glycosylation critical quality attributes. The advances made in protocols for capillary electrophoresis-laser-induced fluorescence (CE-LIF) measurements of antibody N-glycans have increased the potential for generating large datasets of N-glycosylation values for assessment. With large cohorts of CE-LIF data, peak picking and peak area calculations still remain a problem for fast and accurate quantitation, despite the presence of internal and external standards to reduce misalignment for the qualitative analysis. The peak picking and area calculation problems are often due to fluctuations introduced by varying process conditions resulting in heterogeneous peak shapes. Additionally, peaks with co-eluting glycans can produce peaks of a non-Gaussian nature in some process conditions and not in others. Here, we describe an approach to quantitatively and qualitatively curate large cohort CE-LIF glycomics data. For glycan identification, a previously reported method based on internal triple standards is used. For determining the glycan relative quantities our method uses a clustering algorithm to 'divide and conquer' highly heterogeneous electropherograms into similar groups, making it easier to define peaks manually. Open-source software is then used to determine peak areas of the manually defined peaks. We successfully applied this semi-automated method to a dataset (containing 391 glycoprofiles) of monoclonal antibody biosimilars from a bioreactor optimization study. The key advantage of this computational approach is that all runs can be analyzed simultaneously with high accuracy in glycan identification and quantitation and there is no theoretical limit to the scale of this method.
在生物加工过程中准确评估抗体糖基化需要高通量生成大量糖组学数据。这使生物加工工程师能够识别控制糖基化关键质量属性的关键工艺参数。抗体N-聚糖的毛细管电泳-激光诱导荧光(CE-LIF)测量协议取得的进展增加了生成用于评估的N-糖基化值大型数据集的可能性。尽管存在内部和外部标准以减少定性分析中的偏差,但对于大量CE-LIF数据,峰检测和峰面积计算仍然是快速准确定量的一个问题。峰检测和面积计算问题通常是由于不同工艺条件引入的波动导致峰形异质性。此外,在某些工艺条件下,具有共洗脱聚糖的峰会产生非高斯性质的峰,而在其他条件下则不会。在这里,我们描述了一种对大量CE-LIF糖组学数据进行定量和定性整理的方法。对于聚糖鉴定,使用基于内部三重标准的先前报道的方法。为了确定聚糖的相对量,我们的方法使用聚类算法将高度异质的电泳图“分而治之”为相似的组,从而更易于手动定义峰。然后使用开源软件确定手动定义峰的峰面积。我们成功地将这种半自动方法应用于来自生物反应器优化研究的单克隆抗体生物类似物的数据集(包含391个糖谱)。这种计算方法的关键优势在于,所有运行都可以在聚糖鉴定和定量方面以高精度同时进行分析,并且该方法的规模没有理论限制。