Suppr超能文献

个性化回归实现了样本特异性的泛癌分析。

Personalized regression enables sample-specific pan-cancer analysis.

机构信息

Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, USA.

Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA.

出版信息

Bioinformatics. 2018 Jul 1;34(13):i178-i186. doi: 10.1093/bioinformatics/bty250.

Abstract

MOTIVATION

In many applications, inter-sample heterogeneity is crucial to understanding the complex biological processes under study. For example, in genomic analysis of cancers, each patient in a cohort may have a different driver mutation, making it difficult or impossible to identify causal mutations from an averaged view of the entire cohort. Unfortunately, many traditional methods for genomic analysis seek to estimate a single model which is shared by all samples in a population, ignoring this inter-sample heterogeneity entirely. In order to better understand patient heterogeneity, it is necessary to develop practical, personalized statistical models.

RESULTS

To uncover this inter-sample heterogeneity, we propose a novel regularizer for achieving patient-specific personalized estimation. This regularizer operates by learning two latent distance metrics-one between personalized parameters and one between clinical covariates-and attempting to match the induced distances as closely as possible. Crucially, we do not assume these distance metrics are already known. Instead, we allow the data to dictate the structure of these latent distance metrics. Finally, we apply our method to learn patient-specific, interpretable models for a pan-cancer gene expression dataset containing samples from more than 30 distinct cancer types and find strong evidence of personalization effects between cancer types as well as between individuals. Our analysis uncovers sample-specific aberrations that are overlooked by population-level methods, suggesting a promising new path for precision analysis of complex diseases such as cancer.

AVAILABILITY AND IMPLEMENTATION

Software for personalized linear and personalized logistic regression, along with code to reproduce experimental results, is freely available at github.com/blengerich/personalized_regression.

摘要

动机

在许多应用中,样本间的异质性对于理解所研究的复杂生物过程至关重要。例如,在癌症的基因组分析中,队列中的每个患者可能具有不同的驱动突变,使得从整个队列的平均视角识别因果突变变得困难或不可能。不幸的是,许多传统的基因组分析方法都试图估计一个单一的模型,该模型由群体中的所有样本共享,完全忽略了这种样本间的异质性。为了更好地理解患者的异质性,有必要开发实用的个性化统计模型。

结果

为了揭示这种样本间的异质性,我们提出了一种新颖的正则化方法来实现患者特异性的个性化估计。该正则化方法通过学习两个潜在的距离度量值 - 一个是个性化参数之间的距离,另一个是临床协变量之间的距离 - 并尝试尽可能匹配诱导的距离。至关重要的是,我们不假设这些距离度量值是已知的。相反,我们允许数据来决定这些潜在距离度量值的结构。最后,我们将我们的方法应用于学习泛癌基因表达数据集的患者特异性、可解释的模型,该数据集包含来自 30 多种不同癌症类型的样本,并发现癌症类型之间以及个体之间存在强烈的个性化效应的证据。我们的分析揭示了人群水平方法忽略的样本特异性异常,为癌症等复杂疾病的精确分析提供了一条有前途的新途径。

可用性和实现

个性化线性和个性化逻辑回归的软件,以及重现实验结果的代码,可在 github.com/blengerich/personalized_regression 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a203/6022603/e5833c35ede7/bty250f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验