Suppr超能文献

剖析噪声复合数据集之间的随机和系统差异。

Dissecting random and systematic differences between noisy composite data sets.

机构信息

Department of Biology, University of Konstanz, Universitätsstrasse 19, 78457 Konstanz, Germany.

出版信息

Acta Crystallogr D Struct Biol. 2017 Apr 1;73(Pt 4):286-293. doi: 10.1107/S2059798317000699. Epub 2017 Mar 31.

Abstract

Composite data sets measured on different objects are usually affected by random errors, but may also be influenced by systematic (genuine) differences in the objects themselves, or the experimental conditions. If the individual measurements forming each data set are quantitative and approximately normally distributed, a correlation coefficient is often used to compare data sets. However, the relations between data sets are not obvious from the matrix of pairwise correlations since the numerical value of the correlation coefficient is lowered by both random and systematic differences between the data sets. This work presents a multidimensional scaling analysis of the pairwise correlation coefficients which places data sets into a unit sphere within low-dimensional space, at a position given by their CC* values [as defined by Karplus & Diederichs (2012), Science, 336, 1030-1033] in the radial direction and by their systematic differences in one or more angular directions. This dimensionality reduction can not only be used for classification purposes, but also to derive data-set relations on a continuous scale. Projecting the arrangement of data sets onto the subspace spanned by systematic differences (the surface of a unit sphere) allows, irrespective of the random-error levels, the identification of clusters of closely related data sets. The method gains power with increasing numbers of data sets. It is illustrated with an example from low signal-to-noise ratio image processing, and an application in macromolecular crystallography is shown, but the approach is completely general and thus should be widely applicable.

摘要

组合数据集通常受到随机误差的影响,但也可能受到对象本身或实验条件的系统(真实)差异的影响。如果形成每个数据集的各个测量值是定量的并且近似正态分布,则通常使用相关系数来比较数据集。然而,由于数据集之间的随机和系统差异会降低相关系数的数值,因此从成对相关系数的矩阵中无法明显看出数据集之间的关系。这项工作对成对相关系数进行了多维尺度分析,将数据集放置在低维空间中的单位球体内,其位置由它们的 CC* 值确定(由 Karplus 和 Diederichs 于 2012 年在《科学》杂志,336,1030-1033 中定义),在径向方向上由它们的系统差异决定,在一个或多个角度方向上。这种降维不仅可用于分类目的,还可用于在连续尺度上得出数据集关系。将数据集的排列投影到由系统差异(单位球的表面)所张成的子空间上,可以在不考虑随机误差水平的情况下,识别出密切相关的数据集簇。该方法随着数据集数量的增加而增加。它通过低信噪比图像处理的示例进行了说明,并展示了在大分子晶体学中的应用,但该方法是完全通用的,因此应该具有广泛的适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0cd9/5379934/fc467962e9ed/d-73-00286-fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验