Ow Ghim Siong, Tang Zhiqun, Kuznetsov Vladimir A
Bioinformatics Institute, Singapore 138671.
School of Computer Engineering, Nanyang Technological University, Singapore 639798.
Oncotarget. 2016 Jun 28;7(26):40200-40220. doi: 10.18632/oncotarget.9571.
The era of big data and precision medicine has led to accumulation of massive datasets of gene expression data and clinical information of patients. For a new patient, we propose that identification of a highly similar reference patient from an existing patient database via similarity matching of both clinical and expression data could be useful for predicting the prognostic risk or therapeutic efficacy.Here, we propose a novel methodology to predict disease/treatment outcome via analysis of the similarity between any pair of patients who are each characterized by a certain set of pre-defined biological variables (biomarkers or clinical features) represented initially as a prognostic binary variable vector (PBVV) and subsequently transformed to a prognostic signature vector (PSV). Our analyses revealed that Euclidean distance rather correlation distance measure was effective in defining an unbiased similarity measure calculated between two PSVs.We implemented our methods to high-grade serous ovarian cancer (HGSC) based on a 36-mRNA predictor that was previously shown to stratify patients into 3 distinct prognostic subgroups. We studied and revealed that patient's age, when converted into binary variable, was positively correlated with the overall risk of succumbing to the disease. When applied to an independent testing dataset, the inclusion of age into the molecular predictor provided more robust personalized prognosis of overall survival correlated with the therapeutic response of HGSC and provided benefit for treatment targeting of the tumors in HGSC patients.Finally, our method can be generalized and implemented in many other diseases to accurately predict personalized patients' outcomes.
大数据和精准医学时代导致了大量患者基因表达数据和临床信息数据集的积累。对于一名新患者,我们提出通过对临床数据和表达数据进行相似性匹配,从现有患者数据库中识别出高度相似的参考患者,这可能有助于预测预后风险或治疗效果。在此,我们提出了一种新颖的方法,通过分析任意一对患者之间的相似性来预测疾病/治疗结果,每对患者都由一组预先定义的生物学变量(生物标志物或临床特征)来表征,这些变量最初表示为预后二元变量向量(PBVV),随后转换为预后特征向量(PSV)。我们的分析表明,欧几里得距离而非相关距离度量在定义两个PSV之间计算出的无偏相似性度量方面是有效的。我们基于一个先前被证明可将患者分为3个不同预后亚组的36-mRNA预测指标,将我们的方法应用于高级别浆液性卵巢癌(HGSC)。我们研究发现,当将患者年龄转换为二元变量时,其与患该疾病的总体风险呈正相关。当应用于独立测试数据集时,将年龄纳入分子预测指标可提供与HGSC治疗反应相关的更稳健的总体生存个性化预后,并为HGSC患者的肿瘤治疗靶向提供益处。最后,我们的方法可以推广并应用于许多其他疾病,以准确预测个性化的患者预后。