基于调控概率模型的多组转录组学数据集用于癌症生物标志物识别的荟萃分析。

A regulation probability model-based meta-analysis of multiple transcriptomics data sets for cancer biomarker identification.

作者信息

Xie Xin-Ping, Xie Yu-Feng, Wang Hong-Qiang

机构信息

School of Mathematics and Physics, Anhui Jianzhu University, Hefei, Anhui, 230022, China.

Cancer Hospital, CAS, Hefei, Anhui, 230031, China.

出版信息

BMC Bioinformatics. 2017 Aug 23;18(1):375. doi: 10.1186/s12859-017-1794-6.

DOI:10.1186/s12859-017-1794-6

PMID:28830341

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5568075/

Abstract

BACKGROUND

Large-scale accumulation of omics data poses a pressing challenge of integrative analysis of multiple data sets in bioinformatics. An open question of such integrative analysis is how to pinpoint consistent but subtle gene activity patterns across studies. Study heterogeneity needs to be addressed carefully for this goal.

RESULTS

This paper proposes a regulation probability model-based meta-analysis, jGRP, for identifying differentially expressed genes (DEGs). The method integrates multiple transcriptomics data sets in a gene regulatory space instead of in a gene expression space, which makes it easy to capture and manage data heterogeneity across studies from different laboratories or platforms. Specifically, we transform gene expression profiles into a united gene regulation profile across studies by mathematically defining two gene regulation events between two conditions and estimating their occurring probabilities in a sample. Finally, a novel differential expression statistic is established based on the gene regulation profiles, realizing accurate and flexible identification of DEGs in gene regulation space. We evaluated the proposed method on simulation data and real-world cancer datasets and showed the effectiveness and efficiency of jGRP in identifying DEGs identification in the context of meta-analysis.

CONCLUSIONS

Data heterogeneity largely influences the performance of meta-analysis of DEGs identification. Existing different meta-analysis methods were revealed to exhibit very different degrees of sensitivity to study heterogeneity. The proposed method, jGRP, can be a standalone tool due to its united framework and controllable way to deal with study heterogeneity.

摘要

背景

组学数据的大规模积累给生物信息学中多个数据集的综合分析带来了紧迫挑战。这种综合分析的一个悬而未决的问题是如何在各项研究中精准找出一致但细微的基因活性模式。为实现这一目标，需要谨慎处理研究异质性问题。

结果

本文提出了一种基于调控概率模型的荟萃分析方法jGRP，用于识别差异表达基因（DEG）。该方法在基因调控空间而非基因表达空间整合多个转录组数据集，这使得它易于捕捉和管理来自不同实验室或平台的研究中的数据异质性。具体而言，我们通过数学定义两种条件之间的两个基因调控事件并估计它们在样本中的发生概率，将基因表达谱转化为跨研究的统一基因调控谱。最后，基于基因调控谱建立了一种新的差异表达统计量，实现了在基因调控空间中准确且灵活地识别差异表达基因。我们在模拟数据和真实世界的癌症数据集上评估了所提出的方法，并展示了jGRP在荟萃分析背景下识别差异表达基因的有效性和效率。

结论

数据异质性在很大程度上影响差异表达基因识别的荟萃分析性能。现有的不同荟萃分析方法对研究异质性表现出非常不同程度的敏感性。所提出的方法jGRP因其统一的框架和处理研究异质性的可控方式，可以成为一个独立的工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e15/5568075/ea52ebbac8f7/12859_2017_1794_Fig1_HTML.jpg

相似文献

A regulation probability model-based meta-analysis of multiple transcriptomics data sets for cancer biomarker identification.基于调控概率模型的多组转录组学数据集用于癌症生物标志物识别的荟萃分析。

BMC Bioinformatics. 2017 Aug 23;18(1):375. doi: 10.1186/s12859-017-1794-6.

Adaptively capturing the heterogeneity of expression for cancer biomarker identification.自适应捕获癌症生物标志物识别中的表达异质性。

BMC Bioinformatics. 2018 Nov 3;19(1):401. doi: 10.1186/s12859-018-2437-2.

jNMFMA: a joint non-negative matrix factorization meta-analysis of transcriptomics data.jNMFMA：转录组学数据的联合非负矩阵分解荟萃分析

Bioinformatics. 2015 Feb 15;31(4):572-80. doi: 10.1093/bioinformatics/btu679. Epub 2014 Oct 16.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Integrated bioinformatics analysis for differentially expressed genes and signaling pathways identification in gastric cancer.胃癌差异表达基因及信号通路的综合生物信息学分析。

Int J Med Sci. 2021 Jan 1;18(3):792-800. doi: 10.7150/ijms.47339. eCollection 2021.

Identification and validation of core genes in tumor-educated platelets for human gastrointestinal tumor diagnosis using network-based transcriptomic analysis.基于网络转录组分析的人类胃肠道肿瘤诊断中肿瘤诱导血小板的核心基因的鉴定和验证。

Platelets. 2023 Dec;34(1):2212071. doi: 10.1080/09537104.2023.2212071.

An efficient concordant integrative analysis of multiple large-scale two-sample expression data sets.一种高效的多个大规模两样本表达数据集的一致性综合分析方法。

Bioinformatics. 2017 Dec 1;33(23):3852-3860. doi: 10.1093/bioinformatics/btx061.

Investigation of genes and pathways involved in breast cancer subtypes through gene expression meta-analysis.通过基因表达荟萃分析研究乳腺癌亚型相关的基因和通路。

Gene. 2022 May 5;821:146328. doi: 10.1016/j.gene.2022.146328. Epub 2022 Feb 16.

A semi-parametric statistical model for integrating gene expression profiles across different platforms.一种用于整合不同平台基因表达谱的半参数统计模型。

BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):5. doi: 10.1186/s12859-015-0847-y.

Integrative meta-analysis of multiple gene expression profiles in acquired gemcitabine-resistant cancer cell lines to identify novel therapeutic biomarkers.对获得性吉西他滨耐药癌细胞系中多个基因表达谱进行综合荟萃分析，以鉴定新的治疗生物标志物。

Asian Pac J Cancer Prev. 2015;16(7):2793-800. doi: 10.7314/apjcp.2015.16.7.2793.

本文引用的文献

Association of Three Single Nucleotide Polymorphisms in MTR and MTRR Genes with Lung Cancer in a Turkish Population.土耳其人群中MTR和MTRR基因的三个单核苷酸多态性与肺癌的关联

Genet Test Mol Biomarkers. 2017 Jul;21(7):428-432. doi: 10.1089/gtmb.2017.0062. Epub 2017 May 24.

Recurrently deregulated lncRNAs in hepatocellular carcinoma.肝细胞癌中反复失调的长非编码 RNA。

Nat Commun. 2017 Feb 13;8:14421. doi: 10.1038/ncomms14421.

Assigning clinical meaning to somatic and germ-line whole-exome sequencing data in a prospective cancer precision medicine study.在一项前瞻性癌症精准医学研究中，对体细胞和种系全外显子测序数据赋予临床意义。

Genet Med. 2017 Jul;19(7):787-795. doi: 10.1038/gim.2016.191. Epub 2017 Jan 26.

Microarray-based identification of genes associated with cancer progression and prognosis in hepatocellular carcinoma.基于微阵列技术鉴定与肝细胞癌进展和预后相关的基因

J Exp Clin Cancer Res. 2016 Aug 27;35(1):127. doi: 10.1186/s13046-016-0403-2.

COL11A1 is overexpressed in recurrent non-small cell lung cancer and promotes cell proliferation, migration, invasion and drug resistance.COL11A1在复发性非小细胞肺癌中过表达，并促进细胞增殖、迁移、侵袭和耐药性。

Oncol Rep. 2016 Aug;36(2):877-85. doi: 10.3892/or.2016.4869. Epub 2016 Jun 10.

Potential diagnostic and prognostic marker dimethylglycine dehydrogenase (DMGDH) suppresses hepatocellular carcinoma metastasis in vitro and in vivo.潜在的诊断和预后标志物二甲基甘氨酸脱氢酶（DMGDH）在体外和体内均可抑制肝细胞癌转移。

Oncotarget. 2016 May 31;7(22):32607-16. doi: 10.18632/oncotarget.8927.

Network analysis in the identification of special mechanisms between small cell lung cancer and non-small cell lung cancer.网络分析在小细胞肺癌与非小细胞肺癌特殊机制鉴定中的应用。

Thorac Cancer. 2014 Nov;5(6):556-64. doi: 10.1111/1759-7714.12134. Epub 2014 Oct 23.

Identification of potential therapeutic targets for lung cancer by bioinformatics analysis.通过生物信息学分析鉴定肺癌潜在治疗靶点

Mol Med Rep. 2016 Mar;13(3):1975-82. doi: 10.3892/mmr.2015.4752. Epub 2015 Dec 31.

Pan-cancer analysis of TCGA data reveals notable signaling pathways.对TCGA数据的泛癌分析揭示了显著的信号通路。

BMC Cancer. 2015 Jul 14;15:516. doi: 10.1186/s12885-015-1484-6.

Recent advances and current issues in single-cell sequencing of tumors.肿瘤单细胞测序的最新进展与当前问题

Cancer Lett. 2015 Aug 28;365(1):1-10. doi: 10.1016/j.canlet.2015.04.022. Epub 2015 May 21.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于调控概率模型的多组转录组学数据集用于癌症生物标志物识别的荟萃分析。

A regulation probability model-based meta-analysis of multiple transcriptomics data sets for cancer biomarker identification.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献