基于通路活性概率推断的准确可靠的癌症分类。

Accurate and reliable cancer classification based on probabilistic inference of pathway activity.

机构信息

Department of Electrical and Computer Engineering, Texas A&M University, College Station, Texas, United States of America.

出版信息

PLoS One. 2009 Dec 7;4(12):e8161. doi: 10.1371/journal.pone.0008161.

DOI:10.1371/journal.pone.0008161

PMID:19997592

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2781165/

Abstract

With the advent of high-throughput technologies for measuring genome-wide expression profiles, a large number of methods have been proposed for discovering diagnostic markers that can accurately discriminate between different classes of a disease. However, factors such as the small sample size of typical clinical data, the inherent noise in high-throughput measurements, and the heterogeneity across different samples, often make it difficult to find reliable gene markers. To overcome this problem, several studies have proposed the use of pathway-based markers, instead of individual gene markers, for building the classifier. Given a set of known pathways, these methods estimate the activity level of each pathway by summarizing the expression values of its member genes, and use the pathway activities for classification. It has been shown that pathway-based classifiers typically yield more reliable results compared to traditional gene-based classifiers. In this paper, we propose a new classification method based on probabilistic inference of pathway activities. For a given sample, we compute the log-likelihood ratio between different disease phenotypes based on the expression level of each gene. The activity of a given pathway is then inferred by combining the log-likelihood ratios of the constituent genes. We apply the proposed method to the classification of breast cancer metastasis, and show that it achieves higher accuracy and identifies more reproducible pathway markers compared to several existing pathway activity inference methods.

摘要

随着高通量技术用于测量全基因组表达谱的出现，已经提出了许多用于发现诊断标记物的方法，这些标记物可以准确地区分疾病的不同类别。然而，典型临床数据的小样本量、高通量测量中的固有噪声以及不同样本之间的异质性等因素，常常使得很难找到可靠的基因标记物。为了克服这个问题，一些研究提出了使用基于通路的标记物而不是单个基因标记物来构建分类器。给定一组已知的通路，这些方法通过汇总其成员基因的表达值来估计每个通路的活性水平，并使用通路活性进行分类。已经表明，与传统的基于基因的分类器相比，基于通路的分类器通常产生更可靠的结果。在本文中，我们提出了一种基于通路活性概率推理的新分类方法。对于给定的样本，我们根据每个基因的表达水平计算不同疾病表型之间的对数似然比。然后通过组合组成基因的对数似然比来推断给定通路的活性。我们将所提出的方法应用于乳腺癌转移的分类，并表明与几种现有的通路活性推断方法相比，它实现了更高的准确性并识别出更可重复的通路标记物。