Mu Yongcheng, Nguyen Thu, Hawickhorst Bryan, Wriggers Willy, Sun Jiangwen, He Jing
Department of Computer Science, Old Dominion University, Norfolk, VA 23529, United States.
Department of Mechanical and Aerospace Engineering, Old Dominion University, Norfolk, VA 23529, United States.
Bioinform Adv. 2024 Nov 22;4(1):vbae169. doi: 10.1093/bioadv/vbae169. eCollection 2024.
Although multiple neural networks have been proposed for detecting secondary structures from medium-resolution (5-10 Å) cryo-electron microscopy (cryo-EM) maps, the loss functions used in the existing deep learning networks are primarily based on cross-entropy loss, which is known to be sensitive to class imbalances. We investigated five loss functions: cross-entropy, Focal loss, Dice loss, and two combined loss functions. Using a U-Net architecture in our DeepSSETracer method and a dataset composed of 1355 box-cropped atomic-structure/density-map pairs, we found that a newly designed loss function that combines Focal loss and Dice loss provides the best overall detection accuracy for secondary structures. For β-sheet voxels, which are generally much harder to detect than helix voxels, the combined loss function achieved a significant improvement (an 8.8% increase in the F score) compared to the cross-entropy loss function and a noticeable improvement from the Dice loss function. This study demonstrates the potential for designing more effective loss functions for hard cases in the segmentation of secondary structures. The newly trained model was incorporated into DeepSSETracer 1.1 for the segmentation of protein secondary structures in medium-resolution cryo-EM map components. DeepSSETracer can be integrated into ChimeraX, a popular molecular visualization software.
尽管已经提出了多个神经网络用于从中等分辨率(5 - 10埃)的冷冻电子显微镜(cryo - EM)图谱中检测二级结构,但现有深度学习网络中使用的损失函数主要基于交叉熵损失,而交叉熵损失已知对类别不平衡敏感。我们研究了五种损失函数:交叉熵、焦点损失(Focal loss)、骰子损失(Dice loss)以及两种组合损失函数。在我们的DeepSSETracer方法中使用U - Net架构,并使用由1355个裁剪后的原子结构/密度图对组成的数据集,我们发现一种新设计的将焦点损失和骰子损失相结合的损失函数在二级结构的整体检测准确性方面表现最佳。对于通常比螺旋体素更难检测的β折叠体素,与交叉熵损失函数相比,组合损失函数实现了显著提升(F分数提高了8.8%),并且相对于骰子损失函数也有明显改进。本研究展示了为二级结构分割中的困难情况设计更有效损失函数的潜力。新训练的模型已被纳入DeepSSETracer 1.1中,用于中等分辨率冷冻电子显微镜图谱组件中蛋白质二级结构的分割。DeepSSETracer可以集成到流行的分子可视化软件ChimeraX中。