Purnell Carson, Heebner Jessica, Nguyen Linh, Swulius Michael T, Hylton Ryan, Kabonick Seth, Grillo Michael, Grillo Stephanie, Grigoryev Sergei, Heberle Frederick A, Waxham M Neal, Swulius Matthew T
Penn State College of Medicine, Hershey, PA.
University of Tennessee, Knoxville, TN.
bioRxiv. 2025 Feb 5:2025.01.31.635598. doi: 10.1101/2025.01.31.635598.
Deep learning excels at segmenting objects within noisy cryo-electron tomograms, but the approach is typically bottlenecked by access to ground truth training data. To address this issue we have developed CryoTomoSim (CTS), an open-source software package that builds coarse-grained models of macromolecular complexes embedded in vitreous ice and then simulates transmitted electron tilt series for tomographic reconstruction. Using CTS outputs, we demonstrate the effects of key microscope parameters (dose, defocus, and pixel size) on deep learning-based segmentation, and show that including both molecular crowding and diversity within synthetic datasets is key to training cellular segmentation networks from purely synthetic inputs. While very effective as initial models, the accuracy of these networks is currently limited, and real cellular data is necessary to train the most accurate and generalizable U-Nets. Using a co-training approach, we first segment over 100 tomograms from neuronal growth cones to quantify their cytoskeletal distributions and then we build a generalized cellular cryo-ET segmentation network called NeuralSeg that can segment a subset of cellular features in tomograms from all domains of life.
深度学习擅长在有噪声的冷冻电子断层扫描图中分割物体,但该方法通常受到获取真实训练数据的限制。为了解决这个问题,我们开发了CryoTomoSim(CTS),这是一个开源软件包,它构建嵌入在玻璃冰中的大分子复合物的粗粒度模型,然后模拟用于断层重建的透射电子倾斜系列。使用CTS输出,我们展示了关键显微镜参数(剂量、散焦和像素大小)对基于深度学习的分割的影响,并表明在合成数据集中纳入分子拥挤和多样性是从纯合成输入训练细胞分割网络的关键。虽然这些网络作为初始模型非常有效,但目前其准确性有限,需要真实的细胞数据来训练最准确和通用的U-Net。使用协同训练方法,我们首先对来自神经元生长锥的100多张断层扫描图进行分割,以量化其细胞骨架分布,然后构建一个名为NeuralSeg的通用细胞冷冻电子断层扫描分割网络,该网络可以分割来自生命所有领域的断层扫描图中的细胞特征子集。