Suppr超能文献

双 EB:用于物种间多组学数据整合模式识别的经验贝叶斯双聚类

Bi-EB: Empirical Bayesian Biclustering for Multi-Omics Data Integration Pattern Identification among Species.

机构信息

Center for Computational Biology and Bioinformatics, School of Medicine, Indiana University, Indianapolis, IN 46202, USA.

Department of Bio-Health Informatics, School of Informatics, Indiana University, Indianapolis, IN 46202, USA.

出版信息

Genes (Basel). 2022 Oct 30;13(11):1982. doi: 10.3390/genes13111982.

Abstract

Although several biclustering algorithms have been studied, few are used for cross-pattern identification across species using multi-omics data mining. A fast empirical Bayesian biclustering (Bi-EB) algorithm is developed to detect the patterns shared from both integrated omics data and between species. The Bi-EB algorithm addresses the clinical critical translational question using the bioinformatics strategy, which addresses how modules of genotype variation associated with phenotype from cancer cell screening data can be identified and how these findings can be directly translated to a cancer patient subpopulation. Empirical Bayesian probabilistic interpretation and ratio strategy are proposed in Bi-EB for the first time to detect the pairwise regulation patterns among species and variations in multiple omics on a gene level, such as proteins and mRNA. An expectation-maximization (EM) optimal algorithm is used to extract the foreground co-current variations out of its background noise data by adjusting parameters with bicluster membership probability threshold ; and the bicluster average probability . Three simulation experiments and two real biology mRNA and protein data analyses conducted on the well-known (TCGA) and (CCLE) verify that the proposed Bi-EB algorithm can significantly improve the clustering recovery and relevance accuracy, outperforming the other seven biclustering methods-Cheng and Church (CC), xMOTIFs, BiMax, Plaid, Spectral, FABIA, and QUBIC-with a recovery score of 0.98 and a relevance score of 0.99. At the same time, the Bi-EB algorithm is used to determine shared the causality patterns of mRNA to the protein between patients and cancer cells in TCGA and CCLE breast cancer. The clinically well-known treatment target protein module estrogen receptor (ER), ER (p118), AR, BCL2, cyclin E1, and IGFBP2 are identified in accordance with their mRNA expression variations in the luminal-like subtype. Ten genes, including CCNB1, CDH1, KDR, RAB25, PRKCA, etc., found which can maintain the high accordance of mRNA-protein for both breast cancer patients and cell lines in basal-like subtypes for the first time. Bi-EB provides a useful biclustering analysis tool to discover the cross patterns hidden both in multiple data matrixes (omics) and species. The implementation of the Bi-EB method in the clinical setting will have a direct impact on administrating translational research based on the cancer cell screening guidance.

摘要

尽管已经研究了几种双聚类算法,但很少有用于使用多组学数据挖掘在物种之间进行跨模式识别的算法。本文开发了一种快速经验贝叶斯双聚类(Bi-EB)算法,用于检测从整合组学数据和物种之间共享的模式。Bi-EB 算法使用生物信息学策略解决临床关键转化问题,该策略解决了如何识别与癌细胞筛选数据中表型相关的基因型变异模块,以及如何将这些发现直接转化为癌症患者亚群。首次在 Bi-EB 中提出经验贝叶斯概率解释和比例策略,以检测物种之间的成对调节模式以及基因水平上的多个组学(如蛋白质和 mRNA)的变化。期望最大化(EM)最优算法用于通过调整参数来从背景噪声数据中提取前景共变,这些参数包括双聚类成员概率阈值和双聚类平均概率。在著名的 TCGA 和 CCLE 上进行了三个模拟实验和两个真实生物学 mRNA 和蛋白质数据分析,验证了所提出的 Bi-EB 算法可以显著提高聚类恢复和相关性准确性,优于其他七种双聚类方法-Cheng 和 Church(CC)、xMOTIFs、BiMax、Plaid、Spectral、FABIA 和 QUBIC-恢复评分达到 0.98,相关性评分达到 0.99。同时,Bi-EB 算法用于确定 TCGA 和 CCLE 乳腺癌中患者和癌细胞之间的 mRNA 对蛋白质的因果关系模式。临床上著名的治疗靶点蛋白模块雌激素受体(ER)、ER(p118)、AR、BCL2、细胞周期蛋白 E1 和 IGFBP2 是根据其在腔细胞样亚型中的 mRNA 表达变化确定的。首次发现了 10 个基因,包括 CCNB1、CDH1、KDR、RAB25、PRKCA 等,它们在基底样亚型中可以保持乳腺癌患者和细胞系之间的 mRNA-蛋白高度一致性。Bi-EB 提供了一种有用的双聚类分析工具,可以发现隐藏在多个数据矩阵(组学)和物种中的交叉模式。Bi-EB 方法在临床环境中的实施将直接影响基于癌细胞筛选指导的转化研究的管理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/497b/9690013/983fd50a0297/genes-13-01982-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验