Vormittag Philipp, Klamp Thorsten, Hubbuch Jürgen
Institute of Engineering in Life Sciences, Section IV: Biomolecular Separation Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany.
BioNTech SE, Mainz, Germany.
Front Bioeng Biotechnol. 2020 May 5;8:395. doi: 10.3389/fbioe.2020.00395. eCollection 2020.
Virus-like particles (VLPs) are protein-based nanoscale structures that show high potential as immunotherapeutics or cargo delivery vehicles. Chimeric VLPs are decorated with foreign peptides resulting in structures that confer immune responses against the displayed epitope. However, insertion of foreign sequences often results in insoluble proteins, calling for methods capable of assessing a VLP candidate's solubility . The prediction of VLP solubility requires a model that can identify critical hydrophobicity-related parameters, distinguishing between VLP-forming aggregation and aggregation leading to insoluble virus protein clusters. Therefore, we developed and implemented a soft ensemble vote classifier (sEVC) framework based on chimeric hepatitis B core antigen (HBcAg) amino acid sequences and 91 publicly available hydrophobicity scales. Based on each hydrophobicity scale, an individual decision tree was induced as classifier in the sEVC. An embedded feature selection algorithm and stratified sampling proved beneficial for model construction. With a learning experiment, model performance in the space of model training set size and number of included classifiers in the sEVC was explored. Additionally, seven models were created from training data of 24-384 chimeric HBcAg constructs, which were validated by 100-fold Monte Carlo cross-validation. The models predicted external test sets of 184-544 chimeric HBcAg constructs. Best models showed a Matthew's correlation coefficient of >0.6 on the validation and the external test set. Feature selection was evaluated for classifiers with best and worst performance in the chimeric HBcAg VLP solubility scenario. Analysis of the associated hydrophobicity scales allowed for retrieval of biological information related to the mechanistic backgrounds of VLP solubility, suggesting a special role of arginine for VLP assembly and solubility. In the future, the developed sEVC could further be applied to hydrophobicity-related problems in other domains, such as monoclonal antibodies.
病毒样颗粒(VLPs)是基于蛋白质的纳米级结构,作为免疫治疗剂或货物递送载体具有很高的潜力。嵌合VLPs用外源肽进行修饰,形成能引发针对所展示表位的免疫反应的结构。然而,外源序列的插入常常导致蛋白质不溶,这就需要能够评估VLP候选物溶解度的方法。预测VLP溶解度需要一个能够识别关键疏水性相关参数的模型,以区分形成VLP的聚集和导致不溶性病毒蛋白簇的聚集。因此,我们基于嵌合乙型肝炎核心抗原(HBcAg)氨基酸序列和91种公开可用的疏水性量表,开发并实施了一个软集成投票分类器(sEVC)框架。基于每种疏水性量表,在sEVC中诱导一个单独的决策树作为分类器。一种嵌入式特征选择算法和分层抽样被证明对模型构建有益。通过一个学习实验,探索了模型在模型训练集大小和sEVC中包含的分类器数量空间中的性能。此外,从24 - 384个嵌合HBcAg构建体的训练数据创建了7个模型,并通过100倍蒙特卡罗交叉验证进行了验证。这些模型预测了184 - 544个嵌合HBcAg构建体的外部测试集。最佳模型在验证集和外部测试集上的马修斯相关系数>0.6。对嵌合HBcAg VLP溶解度情况中性能最佳和最差的分类器进行了特征选择评估。对相关疏水性量表的分析允许检索与VLP溶解度的机制背景相关的生物学信息,表明精氨酸在VLP组装和溶解度中具有特殊作用。未来,所开发的sEVC可进一步应用于其他领域与疏水性相关的问题,如单克隆抗体。