Suppr超能文献

使用网络滤波器对大规模生物数据进行去噪

Denoising large-scale biological data using network filters.

作者信息

Kavran Andrew J, Clauset Aaron

机构信息

Department of Biochemistry, University of Colorado, Boulder, CO, USA.

BioFrontiers Institute, University of Colorado, Boulder, CO, USA.

出版信息

BMC Bioinformatics. 2021 Mar 25;22(1):157. doi: 10.1186/s12859-021-04075-x.

Abstract

BACKGROUND

Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.

RESULTS

We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or "filtered" to better recover an underlying biological signal. Similar to the process of denoising an image, a single network filter may be applied to an entire system, or the system may be first decomposed into distinct modules and a different filter applied to each. Applied to synthetic data with known network structure and signal, network filters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network filtering prior to training increases accuracy up to 43% compared to using unfiltered data.

CONCLUSIONS

Network filters are a general way to denoise biological data and can account for both correlation and anti-correlation between different measurements. Furthermore, we find that partitioning a network prior to filtering can significantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diffusion based methods. Our results on proteomics data indicate the broad potential utility of network filters to applications in systems biology.

摘要

背景

大规模生物数据集常常受到噪声污染,这可能会妨碍对潜在过程的准确推断。这种测量噪声可能源于内源性生物因素,如细胞周期和生活史变异,也可能源于外源性技术因素,如样品制备和仪器变异。

结果

我们描述了一种自动降低大规模生物数据集噪声的通用方法。该方法使用相互作用网络来识别相关或反相关测量的组,这些组可以组合或“过滤”以更好地恢复潜在的生物信号。类似于对图像进行去噪的过程,可以将单个网络滤波器应用于整个系统,或者可以首先将系统分解为不同的模块,并对每个模块应用不同的滤波器。应用于具有已知网络结构和信号的合成数据时,网络滤波器能够在广泛的噪声水平和结构范围内准确地降低噪声。应用于预测健康组织和癌组织中人类蛋白质表达变化的机器学习任务时,与使用未过滤数据相比,训练前进行网络滤波可将准确率提高高达43%。

结论

网络滤波器是一种对生物数据进行去噪的通用方法,能够考虑不同测量之间的相关性和反相关性。此外,我们发现滤波前对网络进行划分可以显著减少具有异质数据和相关模式的网络中的误差,并且这种方法优于现有的基于扩散的方法。我们在蛋白质组学数据上的结果表明网络滤波器在系统生物学应用中具有广泛的潜在用途。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验