Suppr超能文献

香巴拉-2:一种统一各种格式基因表达谱形状的方案。

Shambhala-2: A Protocol for Uniformly Shaped Harmonization of Gene Expression Profiles of Various Formats.

作者信息

Borisov Nicolas, Sorokin Maksim, Zolotovskaya Marianna, Borisov Constantin, Buzdin Anton

机构信息

Omicsway Corp., Walnut, California.

Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia.

出版信息

Curr Protoc. 2022 May;2(5):e444. doi: 10.1002/cpz1.444.

Abstract

Uniformly shaped harmonization of gene expression profiles is central for the simultaneous comparison of multiple gene expression datasets. It is expected to operate with the gene expression data obtained using various experimental methods and equipment, and to return harmonized profiles in a uniform shape. Such uniformly shaped expression profiles from different initial datasets can be further compared directly. However, current harmonization techniques have strong limitations that prevent their broad use for bioinformatic applications. They can either operate with only up to two datasets/platforms or return data in a dynamic format that will be different for every comparison under analysis. This also does not allow for adding new data to the previously harmonized dataset(s), which complicates the analysis and increases calculation costs. We propose here a new method termed Shambhala-2 that can transform multi-platform expression data into a universal format that is identical for all harmonizations made using this technique. Shambhala-2 is based on sample-by-sample cubic conversion of the initial expression dataset into a preselected shape of the reference definitive dataset. Using 8390 samples of 12 healthy human tissue types and 4086 samples of colorectal, kidney, and lung cancer tissues, we verified Shambhala-2's capacity in restoring tissue-specific expression patterns for seven microarray and three RNA sequencing platforms. Shambhala-2 performed well for all tested combinations of RNAseq and microarray profiles, and retained gene-expression ranks, as evidenced by high correlations between different single- or aggregated gene expression metrics in pre- and post-Shambhalized samples, including preserving cancer-specific gene expression and pathway activation features. © 2022 Wiley Periodicals LLC. Basic Protocol: Shambhala-2 harmonizer Alternate Protocol 1: Linear Shambhala/Shambhala-1 Alternate Protocol 2: Alternative (flexible-format and uniformly shaped) normalization methods Support Protocol 1: Watermelon multisection (WM) Support Protocol 2: Calculation of cancer-to-normal log-fold-change (LFC) and pathway activation level (PAL).

摘要

基因表达谱的均匀形状协调对于多个基因表达数据集的同时比较至关重要。它有望处理使用各种实验方法和设备获得的基因表达数据,并以统一的形状返回协调后的谱图。来自不同初始数据集的这种均匀形状的表达谱可以进一步直接比较。然而,当前的协调技术存在很大局限性,阻碍了它们在生物信息学应用中的广泛使用。它们要么最多只能处理两个数据集/平台,要么以动态格式返回数据,而这种格式在每次分析比较时都会有所不同。这也不允许将新数据添加到先前协调好的数据集,从而使分析变得复杂并增加计算成本。我们在此提出一种名为Shambhala-2的新方法,它可以将多平台表达数据转换为一种通用格式,这种格式对于使用该技术进行的所有协调都是相同的。Shambhala-2基于将初始表达数据集逐个样本地立方转换为参考确定性数据集的预选形状。我们使用12种健康人类组织类型的8390个样本以及结直肠癌、肾癌和肺癌组织的4086个样本,验证了Shambhala-2在恢复七个微阵列和三个RNA测序平台的组织特异性表达模式方面的能力。Shambhala-2在RNAseq和微阵列谱的所有测试组合中表现良好,并保留了基因表达排名,这在Shambhala化前后的样本中不同的单个或聚合基因表达指标之间的高度相关性中得到证明,包括保留癌症特异性基因表达和通路激活特征。© 2022威利期刊公司。基本方案:Shambhala-2协调器替代方案1:线性Shambhala/Shambhala-1替代方案2:替代(灵活格式和均匀形状)归一化方法支持方案1:西瓜多切片(WM)支持方案2:癌症与正常组织的对数倍变化(LFC)和通路激活水平(PAL)的计算

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验