Suppr超能文献

通过几何机器学习选择X射线自由电子激光单粒子快照

Selecting XFEL single-particle snapshots by geometric machine learning.

作者信息

Cruz-Chú Eduardo R, Hosseinizadeh Ahmad, Mashayekhi Ghoncheh, Fung Russell, Ourmazd Abbas, Schwander Peter

机构信息

Department of Physics, University of Wisconsin-Milwaukee, 3135 N. Maryland Ave, Milwaukee, Wisconsin 53211, USA.

出版信息

Struct Dyn. 2021 Feb 18;8(1):014701. doi: 10.1063/4.0000060. eCollection 2021 Jan.

Abstract

A promising new route for structural biology is single-particle imaging with an X-ray Free-Electron Laser (XFEL). This method has the advantage that the samples do not require crystallization and can be examined at room temperature. However, high-resolution structures can only be obtained from a sufficiently large number of diffraction patterns of individual molecules, so-called single particles. Here, we present a method that allows for efficient identification of single particles in very large XFEL datasets, operates at low signal levels, and is tolerant to background. This method uses supervised Geometric Machine Learning (GML) to extract low-dimensional feature vectors from a training dataset, fuse test datasets into the feature space of training datasets, and separate the data into binary distributions of "single particles" and "non-single particles." As a proof of principle, we tested simulated and experimental datasets of the Coliphage PR772 virus. We created a training dataset and classified three types of test datasets: First, a noise-free simulated test dataset, which gave near perfect separation. Second, simulated test datasets that were modified to reflect different levels of photon counts and background noise. These modified datasets were used to quantify the predictive limits of our approach. Third, an experimental dataset collected at the Stanford Linear Accelerator Center. The single-particle identification for this experimental dataset was compared with previously published results and it was found that GML covers a wide photon-count range, outperforming other single-particle identification methods. Moreover, a major advantage of GML is its ability to retrieve single particles in the presence of structural variability.

摘要

结构生物学一个很有前景的新途径是利用X射线自由电子激光(XFEL)进行单颗粒成像。这种方法的优点是样品无需结晶,并且可以在室温下进行检测。然而,高分辨率结构只能从足够数量的单个分子的衍射图案(即所谓的单颗粒)中获得。在此,我们提出了一种方法,该方法能够在非常大的XFEL数据集中高效识别单颗粒,在低信号水平下运行,并且能够容忍背景干扰。此方法使用监督式几何机器学习(GML)从训练数据集中提取低维特征向量,将测试数据集融合到训练数据集的特征空间中,并将数据分离为“单颗粒”和“非单颗粒”的二元分布。作为原理验证,我们测试了大肠杆菌噬菌体PR772病毒的模拟数据集和实验数据集。我们创建了一个训练数据集,并对三种类型的测试数据集进行分类:第一,一个无噪声的模拟测试数据集,其实现了近乎完美的分离。第二,经过修改以反映不同光子计数水平和背景噪声的模拟测试数据集。这些修改后的数据集用于量化我们方法的预测极限。第三,在斯坦福直线加速器中心收集的一个实验数据集。将该实验数据集的单颗粒识别结果与先前发表的结果进行比较,发现GML涵盖了很宽的光子计数范围,优于其他单颗粒识别方法。此外,GML的一个主要优点是它能够在存在结构变异性的情况下检索单颗粒。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73a2/7902084/53f72cc1c891/SDTYAE-000008-014701_1-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验