Biomedical Informatics Training Program, Stanford University, Stanford, CA 94305, USA.
Inflammation & Immunology, Pfizer Inc., Cambridge, MA 02139, USA.
Bioinformatics. 2018 Mar 15;34(6):985-993. doi: 10.1093/bioinformatics/btx651.
Gene-based supervised machine learning classification models have been widely used to differentiate disease states, predict disease progression and determine effective treatment options. However, many of these classifiers are sensitive to noise and frequently do not replicate in external validation sets. For complex, heterogeneous diseases, these classifiers are further limited by being unable to capture varying combinations of genes that lead to the same phenotype. Pathway-based classification can overcome these challenges by using robust, aggregate features to represent biological mechanisms. In this work, we developed a novel pathway-based approach, PRObabilistic Pathway Score, which uses genes to calculate individualized pathway scores for classification. Unlike previous individualized pathway-based classification methods that use gene sets, we incorporate gene interactions using probabilistic graphical models to more accurately represent the underlying biology and achieve better performance. We apply our method to differentiate two similar complex diseases, ulcerative colitis (UC) and Crohn's disease (CD), which are the two main types of inflammatory bowel disease (IBD). Using five IBD datasets, we compare our method against four gene-based and four alternative pathway-based classifiers in distinguishing CD from UC. We demonstrate superior classification performance and provide biological insight into the top pathways separating CD from UC.
PROPS is available as a R package, which can be downloaded at http://simtk.org/home/props or on Bioconductor.
Supplementary data are available at Bioinformatics online.
基于基因的监督机器学习分类模型已被广泛用于区分疾病状态、预测疾病进展和确定有效的治疗方案。然而,许多此类分类器对噪声敏感,并且经常无法在外部验证集中复制。对于复杂、异质的疾病,这些分类器进一步受到限制,因为它们无法捕捉导致相同表型的不同基因组合。基于途径的分类可以通过使用稳健的、综合的特征来代表生物学机制来克服这些挑战。在这项工作中,我们开发了一种新的基于途径的方法,即概率途径评分(PRObabilistic Pathway Score),它使用基因来计算用于分类的个体化途径评分。与以前使用基因集的个体化途径分类方法不同,我们使用概率图形模型来整合基因相互作用,以更准确地表示潜在生物学,并实现更好的性能。我们将我们的方法应用于区分两种类似的复杂疾病,溃疡性结肠炎(UC)和克罗恩病(CD),这是炎症性肠病(IBD)的两种主要类型。使用五个 IBD 数据集,我们将我们的方法与四个基于基因的和四个替代基于途径的分类器进行比较,以区分 CD 与 UC。我们证明了优越的分类性能,并提供了区分 CD 与 UC 的顶级途径的生物学见解。
PROPS 作为 R 包提供,可以从 http://simtk.org/home/props 或 Bioconductor 下载。
补充数据可在 Bioinformatics 在线获得。