整合多个批量或单细胞转录组研究时用于检测多类生物标志物的互信息

Mutual information for detecting multi-class biomarkers when integrating multiple bulk or single-cell transcriptomic studies.

作者信息

Zou Jian, Li Zheqi, Carleton Neil, Oesterreich Steffi, Lee Adrian V, Tseng George C

机构信息

Department of Statistics, School of Public Health, Chongqing Medical University, Chongqing, Chongqing 400016, China.

Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, United States.

出版信息

Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae696.

DOI:10.1093/bioinformatics/btae696

PMID:39563471

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11629966/

Abstract

MOTIVATION

Biomarker detection plays a pivotal role in biomedical research. Integrating omics studies from multiple cohorts can enhance statistical power, accuracy, and robustness of the detection results. However, existing methods for horizontally combining omics studies are mostly designed for two-class scenarios (e.g. cases versus controls) and are not directly applicable for studies with multi-class design (e.g. samples from multiple disease subtypes, treatments, tissues, or cell types).

RESULTS

We propose a statistical framework, namely Mutual Information Concordance Analysis (MICA), to detect biomarkers with concordant multi-class expression pattern across multiple omics studies from an information theoretic perspective. Our approach first detects biomarkers with concordant multi-class patterns across partial or all of the omics studies using a global test by mutual information. A post hoc analysis is then performed for each detected biomarkers and identify studies with concordant pattern. Extensive simulations demonstrate improved accuracy and successful false discovery rate control of MICA compared to an existing multi-class correlation method. The method is then applied to two practical scenarios: four tissues of mouse metabolism-related transcriptomic studies, and three sources of estrogen treatment expression profiles. Detected biomarkers by MICA show intriguing biological insights and functional annotations. Additionally, we implemented MICA for single-cell RNA-Seq data for tumor progression biomarkers, highlighting critical roles of ribosomal function in the tumor microenvironment of triple-negative breast cancer and underscoring the potential of MICA for detecting novel therapeutic targets.

AVAILABILITY AND IMPLEMENTATION

The source code is available on Figshare at https://doi.org/10.6084/m9.figshare.27635436. Additionally, the R package can be installed directly from GitHub at https://github.com/jianzou75/MICA.

摘要

动机

生物标志物检测在生物医学研究中起着关键作用。整合来自多个队列的组学研究可以提高检测结果的统计功效、准确性和稳健性。然而，现有的水平组合组学研究的方法大多是针对两类情况（例如病例与对照）设计的，不适用于多类设计的研究（例如来自多种疾病亚型、治疗、组织或细胞类型的样本）。

结果

我们提出了一个统计框架，即互信息一致性分析（MICA），从信息论的角度检测跨多个组学研究具有一致多类表达模式的生物标志物。我们的方法首先使用互信息全局检验在部分或所有组学研究中检测具有一致多类模式的生物标志物。然后对每个检测到的生物标志物进行事后分析，并识别具有一致模式的研究。广泛的模拟表明，与现有的多类相关方法相比，MICA的准确性有所提高，且能成功控制错误发现率。该方法随后应用于两个实际场景：小鼠代谢相关转录组学研究的四种组织，以及雌激素治疗表达谱的三个来源。MICA检测到的生物标志物显示出有趣的生物学见解和功能注释。此外，我们将MICA应用于单细胞RNA测序数据以检测肿瘤进展生物标志物，突出了核糖体功能在三阴性乳腺癌肿瘤微环境中的关键作用，并强调了MICA在检测新治疗靶点方面的潜力。