Algorithmic Bioinformatics, Bonn-Aachen International Center for IT, Bonn, Germany.
PLoS One. 2013 Sep 3;8(9):e73074. doi: 10.1371/journal.pone.0073074. eCollection 2013.
Predictive, stable and interpretable gene signatures are generally seen as an important step towards a better personalized medicine. During the last decade various methods have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinics is the typical low reproducibility of signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. We here propose a technique that integrates network information as well as different kinds of experimental data (here exemplified by mRNA and miRNA expression) into one classifier. This is done by smoothing t-statistics of individual genes or miRNAs over the structure of a combined protein-protein interaction (PPI) and miRNA-target gene network. A permutation test is conducted to select features in a highly consistent manner, and subsequently a Support Vector Machine (SVM) classifier is trained. Compared to several other competing methods our algorithm reveals an overall better prediction performance for early versus late disease relapse and a higher signature stability. Moreover, obtained gene lists can be clearly associated to biological knowledge, such as known disease genes and KEGG pathways. We demonstrate that our data integration strategy can improve classification performance compared to using a single data source only. Our method, called stSVM, is available in R-package netClass on CRAN (http://cran.r-project.org).
预测性、稳定性和可解释性的基因特征通常被视为迈向更好的个性化医学的重要一步。在过去的十年中,已经提出了各种方法来实现这一目标。然而,使基因特征成为临床标准工具的一个重要障碍是特征的典型低重现性,加上难以实现明确的生物学解释。为此,近年来人们越来越关注试图整合分子相互作用网络信息的方法。我们在这里提出了一种技术,该技术将网络信息以及不同类型的实验数据(这里以 mRNA 和 miRNA 表达为例)整合到一个分类器中。这是通过在组合的蛋白质-蛋白质相互作用(PPI)和 miRNA 靶基因网络的结构上对单个基因或 miRNA 的 t 统计进行平滑处理来实现的。通过置换检验以高度一致的方式选择特征,随后训练支持向量机(SVM)分类器。与其他几种竞争方法相比,我们的算法在早期与晚期疾病复发的预测性能以及更高的特征稳定性方面均表现出整体更好的性能。此外,获得的基因列表可以清楚地与生物学知识相关联,例如已知的疾病基因和 KEGG 途径。我们证明,与仅使用单一数据源相比,我们的数据集成策略可以提高分类性能。我们的方法称为 stSVM,可在 CRAN 上的 R 包 netClass 中获得(http://cran.r-project.org)。