Nawaz Shah, Rahmani Vahid, Pennicard David, Setty Shabarish Pala Ramakantha, Klaudel Barbara, Graafsma Heinz
Deutsches Elektronen-Synchrotron DESY, Notkestraße 85, 22607 Hamburg, Germany.
Gdańsk University of Technology, Gdańsk, Poland.
J Appl Crystallogr. 2023 Sep 20;56(Pt 5):1494-1504. doi: 10.1107/S1600576723007446. eCollection 2023 Oct 1.
Serial crystallography experiments at X-ray free-electron laser facilities produce massive amounts of data but only a fraction of these data are useful for downstream analysis. Thus, it is essential to differentiate between acceptable and unacceptable data, generally known as 'hit' and 'miss', respectively. Image classification methods from artificial intelligence, or more specifically convolutional neural networks (CNNs), classify the data into hit and miss categories in order to achieve data reduction. The quantitative performance established in previous work indicates that CNNs successfully classify serial crystallography data into desired categories [Ke, Brewster, Yu, Ushizima, Yang & Sauter (2018). , 655-670], but no qualitative evidence on the internal workings of these networks has been provided. For example, there are no visualization methods that highlight the features contributing to a specific prediction while classifying data in serial crystallography experiments. Therefore, existing deep learning methods, including CNNs classifying serial crystallography data, are like a 'black box'. To this end, presented here is a qualitative study to unpack the internal workings of CNNs with the aim of visualizing information in the fundamental blocks of a standard network with serial crystallography data. The region(s) or part(s) of an image that mostly contribute to a hit or miss prediction are visualized.
在X射线自由电子激光设施上进行的系列晶体学实验会产生大量数据,但这些数据中只有一小部分可用于下游分析。因此,区分可接受和不可接受的数据至关重要,通常分别称为“命中”和“未命中”。来自人工智能的图像分类方法,或者更具体地说是卷积神经网络(CNN),将数据分类为命中和未命中类别,以实现数据缩减。先前工作中确立的定量性能表明,卷积神经网络成功地将系列晶体学数据分类为所需类别[Ke、Brewster、Yu、Ushizima、Yang和Sauter(2018年),655 - 670页],但尚未提供关于这些网络内部运作的定性证据。例如,在系列晶体学实验中对数据进行分类时,没有可视化方法能够突出有助于特定预测的特征。因此,现有的深度学习方法,包括对系列晶体学数据进行分类的卷积神经网络,就像一个“黑匣子”。为此,本文进行了一项定性研究,以剖析卷积神经网络的内部运作,目的是利用系列晶体学数据可视化标准网络基本模块中的信息。对图像中对命中或未命中预测贡献最大的区域进行可视化。