Computational Genomics Laboratory, University of California, Santa Cruz, Santa Cruz, CA.
Molecular, Cell, and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA.
JCO Clin Cancer Inform. 2020 Feb;4:160-170. doi: 10.1200/CCI.19.00095.
Many antineoplastics are designed to target upregulated genes, but quantifying upregulation in a single patient sample requires an appropriate set of samples for comparison. In cancer, the most natural comparison set is unaffected samples from the matching tissue, but there are often too few available unaffected samples to overcome high intersample variance. Moreover, some cancer samples have misidentified tissues of origin or even composite-tissue phenotypes. Even if an appropriate comparison set can be identified, most differential expression tools are not designed to accommodate comparisons to a single patient sample.
We propose a Bayesian statistical framework for gene expression outlier detection in single samples. Our method uses all available data to produce a consensus background distribution for each gene of interest without requiring the researcher to manually select a comparison set. The consensus distribution can then be used to quantify over- and underexpression.
We demonstrate this method on both simulated and real gene expression data. We show that it can robustly quantify overexpression, even when the set of comparison samples lacks ideally matched tissue samples. Furthermore, our results show that the method can identify appropriate comparison sets from samples of mixed lineage and rediscover numerous known gene-cancer expression patterns.
This exploratory method is suitable for identifying expression outliers from comparative RNA sequencing (RNA-seq) analysis for individual samples, and Treehouse, a pediatric precision medicine group that leverages RNA-seq to identify potential therapeutic leads for patients, plans to explore this method for processing its pediatric cohort.
许多抗肿瘤药物旨在针对上调的基因,但在单个患者样本中定量上调需要一组适当的样本进行比较。在癌症中,最自然的比较集是来自匹配组织的未受影响的样本,但通常没有足够的未受影响的样本来克服高样本间方差。此外,一些癌症样本的组织来源被错误识别,甚至存在复合组织表型。即使可以确定适当的比较集,大多数差异表达工具也不是为了适应与单个患者样本的比较而设计的。
我们提出了一种用于单个样本中基因表达异常值检测的贝叶斯统计框架。我们的方法使用所有可用的数据为每个感兴趣的基因生成一个共识背景分布,而无需研究人员手动选择比较集。然后可以使用共识分布来量化过表达和低表达。
我们在模拟和真实的基因表达数据上演示了这种方法。我们表明,即使比较样本集缺乏理想匹配的组织样本,它也可以稳健地定量过表达。此外,我们的结果表明,该方法可以从混合谱系的样本中识别出合适的比较集,并重新发现许多已知的基因-癌症表达模式。
这种探索性方法适用于识别单个样本比较 RNA 测序(RNA-seq)分析中的表达异常值,Treehouse 是一个儿科精准医疗小组,利用 RNA-seq 为患者确定潜在的治疗靶点,计划探索这种方法来处理其儿科队列。