Suppr超能文献

NPM:通过最近邻匹配对组学数据进行潜在批次效应校正。

NPM: latent batch effects correction of omics data by nearest-pair matching.

作者信息

Zito Antonino, Martinelli Axel, Masiero Mauro, Akhmedov Murodzhon, Kwee Ivo

机构信息

BigOmics Analytics, Via Serafino Balestra 12, Lugano 6900, Switzerland.

出版信息

Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf084.

Abstract

MOTIVATION

Batch effects (BEs) are a predominant source of noise in omics data and often mask real biological signals. BEs remain common in existing datasets. Current methods for BE correction mostly rely on specific assumptions or complex models, and may not detect and adjust BEs adequately, impacting downstream analysis and discovery power. To address these challenges we developed NPM, a nearest-neighbor matching-based method that adjusts BEs and may outperform other methods in a wide range of datasets.

RESULTS

We assessed distinct metrics and graphical readouts, and compared our method to commonly used BE correction methods. NPM demonstrates the ability in correcting for BEs, while preserving biological differences. It may outperform other methods based on multiple metrics. Altogether, NPM proves to be a valuable BE correction approach to maximize discovery in biomedical research, with applicability in clinical research where latent BEs are often dominant.

AVAILABILITY AND IMPLEMENTATION

NPM is freely available on GitHub (https://github.com/bigomics/NPM) and on Omics Playground (https://bigomics.ch/omics-playground). Computer codes for analyses are available at (https://github.com/bigomics/NPM). The datasets underlying this article are the following: GSE120099, GSE82177, GSE162760, GSE171343, GSE153380, GSE163214, GSE182440, GSE163857, GSE117970, GSE173078, and GSE10846. All these datasets are publicly available and can be freely accessed on the Gene Expression Omnibus repository.

摘要

动机

批次效应(BEs)是组学数据中噪声的主要来源,常常掩盖真实的生物信号。批次效应在现有数据集中仍然很常见。当前用于批次效应校正的方法大多依赖于特定假设或复杂模型,可能无法充分检测和调整批次效应,从而影响下游分析和发现能力。为应对这些挑战,我们开发了NPM,一种基于最近邻匹配的方法,该方法可调整批次效应,并且在广泛的数据集上可能优于其他方法。

结果

我们评估了不同的指标和图形读数,并将我们的方法与常用的批次效应校正方法进行了比较。NPM展示了校正批次效应的能力,同时保留了生物学差异。基于多个指标,它可能优于其他方法。总之,NPM被证明是一种有价值的批次效应校正方法,可在生物医学研究中最大限度地实现发现,适用于潜在批次效应通常占主导的临床研究。

可用性和实现方式

NPM可在GitHub(https://github.com/bigomics/NPM)和组学游乐场(https://bigomics.ch/omics-playground)上免费获取。分析的计算机代码可在(https://github.com/bigomics/NPM)获取。本文所依据的数据集如下:GSE120099、GSE82177、GSE16276(此处原文可能有误应为GSE162760)、GSE171343、GSE153380、GSE163214、GSE182440、GSE163857、GSE117970、GSE173078和GSE10846。所有这些数据集都是公开可用的,可在基因表达综合数据库中免费访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e93/11925496/aaf0e0b94013/btaf084f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验