Coffland Sarah, Christensen Katie, Hutchinson Brian, Jagodzinski Filip
Computer Science Department, Western Washington University, Washington, 98225, United States.
Joint Global Change Research Institute, Pacific Northwest National Laboratory, Maryland, 20740, United States.
Bioinform Adv. 2025 Jan 2;5(1):vbae198. doi: 10.1093/bioadv/vbae198. eCollection 2025.
Studying the structural and functional implications of protein mutations is an important task in computational biology and bioinformatics. We leverage our previously proposed RoseNet neural network architecture to predict energy metrics of proteins with double amino acid insertions or deletions (InDels). We train models on previously generated benchmark datasets containing the exhaustive double InDel mutations for three proteins, as well as an additional three proteins for which random mutants, each with two InDels, have been generated. We expand on our previous work by evaluating three additional proteins and analyzing domain features that impact the prediction capabilities of RoseNet. These features include InDels into secondary structures and the solvent accessible surface area (SASA) scores of the residues. We uncover further evidence to support that RoseNet has a higher proficiency of generalizing to unseen residue combinations than unseen insertion positions. We also observe that RoseNet produces higher-quality predictions when inserting into a -sheet over an -helix. Additionally, when the insertions fall in an area of high SASA, RoseNet often displays better performance than inserting into areas of low SASA.
The code used for training and evaluating the models in the study and the data underlying this article are available at https://github.com/hutchresearch/RoseNet.
研究蛋白质突变的结构和功能影响是计算生物学和生物信息学中的一项重要任务。我们利用之前提出的RoseNet神经网络架构来预测具有双氨基酸插入或缺失(InDels)的蛋白质的能量指标。我们在之前生成的基准数据集上训练模型,该数据集包含三种蛋白质的详尽双InDel突变,以及另外三种已生成随机突变体(每个突变体有两个InDels)的蛋白质。我们通过评估另外三种蛋白质并分析影响RoseNet预测能力的结构域特征来扩展我们之前的工作。这些特征包括二级结构中的InDels以及残基的溶剂可及表面积(SASA)分数。我们发现了进一步的证据来支持RoseNet对未见残基组合的泛化能力高于对未见插入位置的泛化能力。我们还观察到,当插入到β折叠中时,RoseNet比插入到α螺旋中能产生更高质量的预测。此外,当插入落在高SASA区域时,RoseNet的性能通常比插入到低SASA区域更好。
本研究中用于训练和评估模型的代码以及本文所依据的数据可在https://github.com/hutchresearch/RoseNet上获取。