Suppr超能文献

一个基于稳健统计分析的 Python 包,用于处理连续晶体学数据。

A Python package based on robust statistical analysis for serial crystallography data processing.

机构信息

The Walter and Eliza Hall Institute of Medical Research, Parkville, Melbourne, Victoria 3052, Australia.

School of Physics and Astronomy, Monash University, Clayton, Victoria 3800, Australia.

出版信息

Acta Crystallogr D Struct Biol. 2023 Sep 1;79(Pt 9):820-829. doi: 10.1107/S2059798323005855. Epub 2023 Aug 16.

Abstract

The term robustness in statistics refers to methods that are generally insensitive to deviations from model assumptions. In other words, robust methods are able to preserve their accuracy even when the data do not perfectly fit the statistical models. Robust statistical analyses are particularly effective when analysing mixtures of probability distributions. Therefore, these methods enable the discretization of X-ray serial crystallography data into two probability distributions: a group comprising true data points (for example the background intensities) and another group comprising outliers (for example Bragg peaks or bad pixels on an X-ray detector). These characteristics of robust statistical analysis are beneficial for the ever-increasing volume of serial crystallography (SX) data sets produced at synchrotron and X-ray free-electron laser (XFEL) sources. The key advantage of the use of robust statistics for some applications in SX data analysis is that it requires minimal parameter tuning because of its insensitivity to the input parameters. In this paper, a software package called Robust Gaussian Fitting library (RGFlib) is introduced that is based on the concept of robust statistics. Two methods are presented based on the concept of robust statistics and RGFlib for two SX data-analysis tasks: (i) a robust peak-finding algorithm and (ii) an automated robust method to detect bad pixels on X-ray pixel detectors.

摘要

在统计学中,稳健性是指方法通常对偏离模型假设不敏感。换句话说,即使数据与统计模型不完全匹配,稳健方法也能够保持其准确性。稳健的统计分析在分析概率分布的混合物时特别有效。因此,这些方法能够将 X 射线连续晶体学数据离散化为两个概率分布:一个包含真实数据点的组(例如背景强度),另一个包含异常值的组(例如 X 射线探测器上的布拉格峰或坏像素)。稳健统计分析的这些特性有利于在同步加速器和 X 射线自由电子激光(XFEL)源产生的越来越多的连续晶体学(SX)数据集。在 SX 数据分析的某些应用中使用稳健统计的主要优点是,由于其对输入参数不敏感,因此需要最小的参数调整。本文介绍了一个名为 Robust Gaussian Fitting library(RGFlib)的软件包,它基于稳健统计学的概念。基于稳健统计学和 RGFlib 的概念,本文提出了两种用于两个 SX 数据分析任务的方法:(i)稳健峰查找算法,(ii)用于检测 X 射线像素探测器上坏像素的自动稳健方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a66d/10478633/d2917e65855d/d-79-00820-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验