Cosbin：基于余弦分数的生物多样样本迭代归一化

Cosbin: cosine score-based iterative normalization of biologically diverse samples.

作者信息

Wu Chiung-Ting, Shen Minjie, Du Dongping, Cheng Zuolin, Parker Sarah J, Lu Yingzhou, Van Eyk Jennifer E, Yu Guoqiang, Clarke Robert, Herrington David M, Wang Yue

机构信息

Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.

Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA.

出版信息

Bioinform Adv. 2022 Oct 20;2(1):vbac076. doi: 10.1093/bioadv/vbac076. eCollection 2022.

DOI:10.1093/bioadv/vbac076

PMID:36330358

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9614059/

Abstract

MOTIVATION

Data normalization is essential to ensure accurate inference and comparability of gene expression measures across samples or conditions. Ideally, gene expression data should be rescaled based on consistently expressed reference genes. However, to normalize biologically diverse samples, the most commonly used reference genes exhibit striking expression variability and size-factor or distribution-based normalization methods can be problematic when the amount of asymmetry in differential expression is significant.

RESULTS

We report an efficient and accurate data-driven method-Cosine score-based iterative normalization (Cosbin)-to normalize biologically diverse samples. Based on the Cosine scores of cross-condition expression patterns, the Cosbin pipeline iteratively eliminates asymmetric differentially expressed genes, identifies consistently expressed genes, and calculates sample-wise normalization factors. We demonstrate the superior performance and enhanced utility of Cosbin compared with six representative peer methods using both simulation and real multi-omics expression datasets. Implemented in open-source R scripts and specifically designed to address normalization bias due to significant asymmetry in differential expression across multiple conditions, the Cosbin tool complements rather than replaces the existing methods and will allow biologists to more accurately detect true molecular signals among diverse phenotypic groups.

AVAILABILITY AND IMPLEMENTATION

The R scripts of Cosbin pipeline are freely available at https://github.com/MinjieSh/Cosbin.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

数据归一化对于确保跨样本或条件的基因表达测量的准确推断和可比性至关重要。理想情况下，基因表达数据应基于持续表达的参考基因进行重新缩放。然而，为了对生物多样性不同的样本进行归一化，最常用的参考基因表现出显著的表达变异性，并且当差异表达中的不对称量很大时，基于大小因子或分布的归一化方法可能会出现问题。

结果

我们报告了一种高效且准确的数据驱动方法——基于余弦评分的迭代归一化（Cosbin），用于对生物多样性不同的样本进行归一化。基于跨条件表达模式的余弦评分，Cosbin流程迭代地消除不对称差异表达基因，识别持续表达的基因，并计算样本特异性归一化因子。我们使用模拟和真实的多组学表达数据集，证明了Cosbin与六种代表性同类方法相比具有卓越的性能和更高的实用性。Cosbin工具以开源R脚本实现，专门设计用于解决由于跨多个条件的差异表达中存在显著不对称而导致的归一化偏差，它补充而非取代现有方法，将使生物学家能够在不同表型组中更准确地检测真实的分子信号。

可用性和实现方式

Cosbin流程的R脚本可在https://github.com/MinjieSh/Cosbin上免费获取。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c7c/9710683/2f4c4af47b7b/vbac076f1.jpg

相似文献

Cosbin: cosine score-based iterative normalization of biologically diverse samples.Cosbin：基于余弦分数的生物多样样本迭代归一化

Bioinform Adv. 2022 Oct 20;2(1):vbac076. doi: 10.1093/bioadv/vbac076. eCollection 2022.

ABDS: tool suite for analyzing biologically diverse samples.ABDS：用于分析生物多样性样本的工具套件。

bioRxiv. 2023 Jul 5:2023.07.05.547797. doi: 10.1101/2023.07.05.547797.

COT: an efficient and accurate method for detecting marker genes among many subtypes.COT：一种在多种亚型中检测标记基因的高效且准确的方法。

Bioinform Adv. 2022 May 27;2(1):vbac037. doi: 10.1093/bioadv/vbac037. eCollection 2022.

ABDS: a bioinformatics tool suite for analyzing biologically diverse samples.ABDS：一个用于分析生物多样性样本的生物信息学工具套件。

Res Sq. 2024 May 30:rs.3.rs-4419408. doi: 10.21203/rs.3.rs-4419408/v1.

NVT: a fast and simple tool for the assessment of RNA-seq normalization strategies.NVT：一种用于评估RNA测序标准化策略的快速简便工具。

Bioinformatics. 2016 Dec 1;32(23):3682-3684. doi: 10.1093/bioinformatics/btw521. Epub 2016 Aug 11.

Super-delta2: an enhanced differential expression analysis procedure for multi-group comparisons of RNA-seq data.超δ2：一种用于RNA-seq数据多组比较的增强型差异表达分析程序。

Bioinformatics. 2021 Sep 9;37(17):2627-2636. doi: 10.1093/bioinformatics/btab155.

MatchMixeR: a cross-platform normalization method for gene expression data integration.MatchMixeR：一种用于基因表达数据整合的跨平台归一化方法。

Bioinformatics. 2020 Apr 15;36(8):2486-2491. doi: 10.1093/bioinformatics/btz974.

A novel normalization and differential abundance test framework for microbiome data.一种用于微生物组数据的归一化和差异丰度测试的新框架。

Bioinformatics. 2020 Jul 1;36(13):3959-3965. doi: 10.1093/bioinformatics/btaa255.

Meffil: efficient normalization and analysis of very large DNA methylation datasets.Meffil：高效的大规模 DNA 甲基化数据集的标准化和分析。

Bioinformatics. 2018 Dec 1;34(23):3983-3989. doi: 10.1093/bioinformatics/bty476.

MAFFIN: metabolomics sample normalization using maximal density fold change with high-quality metabolic features and corrected signal intensities.MAFFIN：使用具有高质量代谢特征和校正信号强度的最大密度倍数变化进行代谢组学样本归一化。

Bioinformatics. 2022 Jun 27;38(13):3429-3437. doi: 10.1093/bioinformatics/btac355.

引用本文的文献

DruGagent: Multi-Agent Large Language Model-Based Reasoning for Drug-Target Interaction Prediction.药物智能体：基于多智能体大语言模型的药物-靶点相互作用预测推理

ArXiv. 2025 Apr 7:arXiv:2408.13378v4.

本文引用的文献

COT: an efficient and accurate method for detecting marker genes among many subtypes.COT：一种在多种亚型中检测标记基因的高效且准确的方法。

Bioinform Adv. 2022 May 27;2(1):vbac037. doi: 10.1093/bioadv/vbac037. eCollection 2022.

Robust normalization and transformation techniques for constructing gene coexpression networks from RNA-seq data.从 RNA-seq 数据构建基因共表达网络的稳健归一化和转换技术。

Genome Biol. 2022 Jan 3;23(1):1. doi: 10.1186/s13059-021-02568-9.

Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols.在比较不同样本和测序方案时，滥用 RPKM 或 TPM 标准化。

RNA. 2020 Aug;26(8):903-909. doi: 10.1261/rna.074922.120. Epub 2020 Apr 13.

Identification of Putative Early Atherosclerosis Biomarkers by Unsupervised Deconvolution of Heterogeneous Vascular Proteomes.通过非监督去卷积异质血管蛋白质组学鉴定动脉粥样硬化早期生物标志物。

J Proteome Res. 2020 Jul 2;19(7):2794-2806. doi: 10.1021/acs.jproteome.0c00118. Epub 2020 Apr 7.

Conventionally used reference genes are not outstanding for normalization of gene expression in human cancer research.在人类癌症研究中，常规使用的参照基因并不适合用于基因表达的标准化。

BMC Bioinformatics. 2019 May 29;20(Suppl 10):245. doi: 10.1186/s12859-019-2809-2.

Proteomic Architecture of Human Coronary and Aortic Atherosclerosis.人类冠状动脉和主动脉粥样硬化的蛋白质组学结构。

Circulation. 2018 Jun 19;137(25):2741-2756. doi: 10.1161/CIRCULATIONAHA.118.034365.

Smooth quantile normalization.平滑分位数归一化

Biostatistics. 2018 Apr 1;19(2):185-198. doi: 10.1093/biostatistics/kxx028.

Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.从假设的角度选择样本间 RNA-Seq 标准化方法。

Brief Bioinform. 2018 Sep 28;19(5):776-792. doi: 10.1093/bib/bbx008.

quantro: a data-driven approach to guide the choice of an appropriate normalization method.Quantro：一种数据驱动的方法，用于指导选择合适的归一化方法。

Genome Biol. 2015 Jun 4;16(1):117. doi: 10.1186/s13059-015-0679-0.

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.使用DESeq2对RNA测序数据的倍数变化和离散度进行适度估计。

Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Cosbin：基于余弦分数的生物多样样本迭代归一化

Cosbin: cosine score-based iterative normalization of biologically diverse samples.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献