German Cancer Research Center, Cancer Genome Research, Im Neuenheimer Feld 460, 69120 Heidelberg, Germany.
BMC Bioinformatics. 2011 Dec 21;12:488. doi: 10.1186/1471-2105-12-488.
One of the main goals in cancer studies including high-throughput microRNA (miRNA) and mRNA data is to find and assess prognostic signatures capable of predicting clinical outcome. Both mRNA and miRNA expression changes in cancer diseases are described to reflect clinical characteristics like staging and prognosis. Furthermore, miRNA abundance can directly affect target transcripts and translation in tumor cells. Prediction models are trained to identify either mRNA or miRNA signatures for patient stratification. With the increasing number of microarray studies collecting mRNA and miRNA from the same patient cohort there is a need for statistical methods to integrate or fuse both kinds of data into one prediction model in order to find a combined signature that improves the prediction.
Here, we propose a new method to fuse miRNA and mRNA data into one prediction model. Since miRNAs are known regulators of mRNAs we used the correlations between them as well as the target prediction information to build a bipartite graph representing the relations between miRNAs and mRNAs. This graph was used to guide the feature selection in order to improve the prediction. The method is illustrated on a prostate cancer data set comprising 98 patient samples with miRNA and mRNA expression data. The biochemical relapse was used as clinical endpoint. It could be shown that the bipartite graph in combination with both data sets could improve prediction performance as well as the stability of the feature selection.
Fusion of mRNA and miRNA expression data into one prediction model improves clinical outcome prediction in terms of prediction error and stable feature selection. The R source code of the proposed method is available in the supplement.
癌症研究的主要目标之一,包括高通量 microRNA (miRNA) 和 mRNA 数据,是寻找和评估能够预测临床结果的预后特征。癌症疾病中的 mRNA 和 miRNA 表达变化被描述为反映临床特征,如分期和预后。此外,miRNA 的丰度可以直接影响肿瘤细胞中的靶转录物和翻译。预测模型被训练用于识别 mRNA 或 miRNA 特征,以对患者进行分层。随着越来越多的微阵列研究从同一患者队列中收集 mRNA 和 miRNA,需要统计方法将这两种数据集成或融合到一个预测模型中,以找到一种组合特征,从而提高预测效果。
在这里,我们提出了一种将 miRNA 和 mRNA 数据融合到一个预测模型中的新方法。由于 miRNA 是已知的 mRNA 调节剂,我们使用它们之间的相关性以及靶标预测信息来构建一个代表 miRNA 和 mRNA 之间关系的二分图。该图用于指导特征选择,以提高预测效果。该方法在一个包含 98 个患者样本的前列腺癌数据集上进行了说明,该数据集包含 miRNA 和 mRNA 表达数据。生化复发被用作临床终点。结果表明,二分图结合两种数据集可以提高预测性能以及特征选择的稳定性。
将 mRNA 和 miRNA 表达数据融合到一个预测模型中可以提高临床结果预测的预测误差和稳定的特征选择。所提出方法的 R 源代码可在补充材料中获得。