贝叶斯框架用于检测个体样本中的基因表达异常值。

Bayesian Framework for Detecting Gene Expression Outliers in Individual Samples.

机构信息

Computational Genomics Laboratory, University of California, Santa Cruz, Santa Cruz, CA.

Molecular, Cell, and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA.

出版信息

JCO Clin Cancer Inform. 2020 Feb;4:160-170. doi: 10.1200/CCI.19.00095.

DOI:10.1200/CCI.19.00095

PMID:32097024

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7053807/

Abstract

PURPOSE

Many antineoplastics are designed to target upregulated genes, but quantifying upregulation in a single patient sample requires an appropriate set of samples for comparison. In cancer, the most natural comparison set is unaffected samples from the matching tissue, but there are often too few available unaffected samples to overcome high intersample variance. Moreover, some cancer samples have misidentified tissues of origin or even composite-tissue phenotypes. Even if an appropriate comparison set can be identified, most differential expression tools are not designed to accommodate comparisons to a single patient sample.

METHODS

We propose a Bayesian statistical framework for gene expression outlier detection in single samples. Our method uses all available data to produce a consensus background distribution for each gene of interest without requiring the researcher to manually select a comparison set. The consensus distribution can then be used to quantify over- and underexpression.

RESULTS

We demonstrate this method on both simulated and real gene expression data. We show that it can robustly quantify overexpression, even when the set of comparison samples lacks ideally matched tissue samples. Furthermore, our results show that the method can identify appropriate comparison sets from samples of mixed lineage and rediscover numerous known gene-cancer expression patterns.

CONCLUSION

This exploratory method is suitable for identifying expression outliers from comparative RNA sequencing (RNA-seq) analysis for individual samples, and Treehouse, a pediatric precision medicine group that leverages RNA-seq to identify potential therapeutic leads for patients, plans to explore this method for processing its pediatric cohort.

摘要

目的

许多抗肿瘤药物旨在针对上调的基因，但在单个患者样本中定量上调需要一组适当的样本进行比较。在癌症中，最自然的比较集是来自匹配组织的未受影响的样本，但通常没有足够的未受影响的样本来克服高样本间方差。此外，一些癌症样本的组织来源被错误识别，甚至存在复合组织表型。即使可以确定适当的比较集，大多数差异表达工具也不是为了适应与单个患者样本的比较而设计的。

方法

我们提出了一种用于单个样本中基因表达异常值检测的贝叶斯统计框架。我们的方法使用所有可用的数据为每个感兴趣的基因生成一个共识背景分布，而无需研究人员手动选择比较集。然后可以使用共识分布来量化过表达和低表达。

结果

我们在模拟和真实的基因表达数据上演示了这种方法。我们表明，即使比较样本集缺乏理想匹配的组织样本，它也可以稳健地定量过表达。此外，我们的结果表明，该方法可以从混合谱系的样本中识别出合适的比较集，并重新发现许多已知的基因-癌症表达模式。

结论

这种探索性方法适用于识别单个样本比较 RNA 测序（RNA-seq）分析中的表达异常值，Treehouse 是一个儿科精准医疗小组，利用 RNA-seq 为患者确定潜在的治疗靶点，计划探索这种方法来处理其儿科队列。

相似文献

Bayesian Framework for Detecting Gene Expression Outliers in Individual Samples.贝叶斯框架用于检测个体样本中的基因表达异常值。

JCO Clin Cancer Inform. 2020 Feb;4:160-170. doi: 10.1200/CCI.19.00095.

Biomarker identification and cancer classification based on microarray data using Laplace naive Bayes model with mean shrinkage.基于微阵列数据的拉普拉斯朴素贝叶斯模型均值收缩的生物标志物识别和癌症分类。

IEEE/ACM Trans Comput Biol Bioinform. 2012 Nov-Dec;9(6):1649-62. doi: 10.1109/TCBB.2012.105.

Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.使用微阵列基因表达数据的用于疾病分类的核嵌入高斯过程。

BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.

Integrating gene set analysis and nonlinear predictive modeling of disease phenotypes using a Bayesian multitask formulation.使用贝叶斯多任务公式整合疾病表型的基因集分析和非线性预测建模。

BMC Bioinformatics. 2016 Dec 13;17(Suppl 16):0. doi: 10.1186/s12859-016-1311-3.

A Robust Approach for Identification of Cancer Biomarkers and Candidate Drugs.一种稳健的癌症生物标志物和候选药物鉴定方法。

Medicina (Kaunas). 2019 Jun 11;55(6):269. doi: 10.3390/medicina55060269.

Quantitative or qualitative transcriptional diagnostic signatures? A case study for colorectal cancer.定量或定性转录诊断特征？以结直肠癌为例。

BMC Genomics. 2018 Jan 29;19(1):99. doi: 10.1186/s12864-018-4446-y.

A probabilistic approach for automated discovery of perturbed genes using expression data from microarray or RNA-Seq.一种使用来自微阵列或RNA测序的表达数据自动发现受干扰基因的概率方法。

Comput Biol Med. 2015 Dec 1;67:29-40. doi: 10.1016/j.compbiomed.2015.07.029. Epub 2015 Aug 14.

Molecular profiling of human non-small cell lung cancer by single-cell RNA-seq.单细胞 RNA 测序对人类非小细胞肺癌的分子谱分析。

Genome Med. 2022 Aug 13;14(1):87. doi: 10.1186/s13073-022-01089-9.

A naive Bayes algorithm for tissue origin diagnosis (TOD-Bayes) of synchronous multifocal tumors in the hepatobiliary and pancreatic system.一种用于肝胆胰系统同步多灶性肿瘤组织起源诊断（TOD-Bayes）的朴素贝叶斯算法。

Int J Cancer. 2018 Jan 15;142(2):357-368. doi: 10.1002/ijc.31054. Epub 2017 Oct 16.

Comparative Tumor RNA Sequencing Analysis for Difficult-to-Treat Pediatric and Young Adult Patients With Cancer.比较肿瘤 RNA 测序分析在治疗困难的儿科和青年成人癌症患者中的应用。

JAMA Netw Open. 2019 Oct 2;2(10):e1913968. doi: 10.1001/jamanetworkopen.2019.13968.

引用本文的文献

Application of Multi-Omics Techniques in Aquatic Ecotoxicology: A Review.多组学技术在水生生态毒理学中的应用：综述

Toxics. 2025 Jul 31;13(8):653. doi: 10.3390/toxics13080653.

N-of-one differential gene expression without control samples using a deep generative model.使用深度生成模型进行无对照样本的 N-of-one 差异基因表达分析。

Genome Biol. 2023 Nov 16;24(1):263. doi: 10.1186/s13059-023-03104-7.

Applications of multi-omics analysis in human diseases.多组学分析在人类疾病中的应用。

MedComm (2020). 2023 Jul 31;4(4):e315. doi: 10.1002/mco2.315. eCollection 2023 Aug.

Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis.整合分子、影像和临床数据分析以预测癌症预后。

Cancers (Basel). 2022 Jun 30;14(13):3215. doi: 10.3390/cancers14133215.

本文引用的文献

JAMA Netw Open. 2019 Oct 2;2(10):e1913968. doi: 10.1001/jamanetworkopen.2019.13968.

Comparative RNA-Sequencing Analysis Benefits a Pediatric Patient With Relapsed Cancer.比较性RNA测序分析使一名复发性癌症患儿受益。

JCO Precis Oncol. 2018;2. doi: 10.1200/PO.17.00198. Epub 2018 Apr 19.

Global variation in gene expression and the value of diverse sampling.基因表达的全球差异及多样采样的价值

Curr Opin Syst Biol. 2017 Feb;1:102-108. doi: 10.1016/j.coisb.2016.12.018. Epub 2017 Mar 3.

Toil enables reproducible, open source, big biomedical data analyses.Toil支持可重复的、开源的大型生物医学数据分析。

Nat Biotechnol. 2017 Apr 11;35(4):314-316. doi: 10.1038/nbt.3772.

Implementation of next generation sequencing into pediatric hematology-oncology practice: moving beyond actionable alterations.下一代测序技术在儿科血液肿瘤学实践中的应用：超越可操作的改变。

Genome Med. 2016 Dec 23;8(1):133. doi: 10.1186/s13073-016-0389-6.

Repositioning FDA-Approved Drugs in Combination with Epigenetic Drugs to Reprogram Colon Cancer Epigenome.重新定位经美国食品药品监督管理局批准的药物与表观遗传药物联合使用，以重编程结肠癌表观基因组。

Mol Cancer Ther. 2017 Feb;16(2):397-407. doi: 10.1158/1535-7163.MCT-16-0588. Epub 2016 Dec 15.

Next-generation personalised medicine for high-risk paediatric cancer patients - The INFORM pilot study.为高危儿科癌症患者提供下一代个性化药物治疗 - INFORM 试点研究。

Eur J Cancer. 2016 Sep;65:91-101. doi: 10.1016/j.ejca.2016.06.009. Epub 2016 Jul 29.

Beta-Poisson model for single-cell RNA-seq data analyses.单细胞 RNA-seq 数据分析的 Beta-Poisson 模型。

Bioinformatics. 2016 Jul 15;32(14):2128-35. doi: 10.1093/bioinformatics/btw202. Epub 2016 Apr 19.

Lessons learned from the application of whole-genome analysis to the treatment of patients with advanced cancers.从全基因组分析在晚期癌症患者治疗中的应用中吸取的经验教训。

Cold Spring Harb Mol Case Stud. 2015 Oct;1(1):a000570. doi: 10.1101/mcs.a000570.

MultiDimensional ClinOmics for Precision Therapy of Children and Adolescent Young Adults with Relapsed and Refractory Cancer: A Report from the Center for Cancer Research.用于复发和难治性癌症儿童及青少年精准治疗的多维临床组学：癌症研究中心报告

Clin Cancer Res. 2016 Aug 1;22(15):3810-20. doi: 10.1158/1078-0432.CCR-15-2717. Epub 2016 Mar 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验