Suppr超能文献

Cosbin:基于余弦分数的生物多样样本迭代归一化

Cosbin: cosine score-based iterative normalization of biologically diverse samples.

作者信息

Wu Chiung-Ting, Shen Minjie, Du Dongping, Cheng Zuolin, Parker Sarah J, Lu Yingzhou, Van Eyk Jennifer E, Yu Guoqiang, Clarke Robert, Herrington David M, Wang Yue

机构信息

Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.

Advanced Clinical Biosystems Research Institute, Cedars Sinai Medical Center, Los Angeles, CA 90048, USA.

出版信息

Bioinform Adv. 2022 Oct 20;2(1):vbac076. doi: 10.1093/bioadv/vbac076. eCollection 2022.

Abstract

MOTIVATION

Data normalization is essential to ensure accurate inference and comparability of gene expression measures across samples or conditions. Ideally, gene expression data should be rescaled based on consistently expressed reference genes. However, to normalize biologically diverse samples, the most commonly used reference genes exhibit striking expression variability and size-factor or distribution-based normalization methods can be problematic when the amount of asymmetry in differential expression is significant.

RESULTS

We report an efficient and accurate data-driven method-Cosine score-based iterative normalization (Cosbin)-to normalize biologically diverse samples. Based on the Cosine scores of cross-condition expression patterns, the Cosbin pipeline iteratively eliminates asymmetric differentially expressed genes, identifies consistently expressed genes, and calculates sample-wise normalization factors. We demonstrate the superior performance and enhanced utility of Cosbin compared with six representative peer methods using both simulation and real multi-omics expression datasets. Implemented in open-source R scripts and specifically designed to address normalization bias due to significant asymmetry in differential expression across multiple conditions, the Cosbin tool complements rather than replaces the existing methods and will allow biologists to more accurately detect true molecular signals among diverse phenotypic groups.

AVAILABILITY AND IMPLEMENTATION

The R scripts of Cosbin pipeline are freely available at https://github.com/MinjieSh/Cosbin.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

数据归一化对于确保跨样本或条件的基因表达测量的准确推断和可比性至关重要。理想情况下,基因表达数据应基于持续表达的参考基因进行重新缩放。然而,为了对生物多样性不同的样本进行归一化,最常用的参考基因表现出显著的表达变异性,并且当差异表达中的不对称量很大时,基于大小因子或分布的归一化方法可能会出现问题。

结果

我们报告了一种高效且准确的数据驱动方法——基于余弦评分的迭代归一化(Cosbin),用于对生物多样性不同的样本进行归一化。基于跨条件表达模式的余弦评分,Cosbin流程迭代地消除不对称差异表达基因,识别持续表达的基因,并计算样本特异性归一化因子。我们使用模拟和真实的多组学表达数据集,证明了Cosbin与六种代表性同类方法相比具有卓越的性能和更高的实用性。Cosbin工具以开源R脚本实现,专门设计用于解决由于跨多个条件的差异表达中存在显著不对称而导致的归一化偏差,它补充而非取代现有方法,将使生物学家能够在不同表型组中更准确地检测真实的分子信号。

可用性和实现方式

Cosbin流程的R脚本可在https://github.com/MinjieSh/Cosbin上免费获取。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0c7c/9710683/2f4c4af47b7b/vbac076f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验