Park Heewon, Shimamura Teppei, Miyano Satoru, Imoto Seiya
Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan.
PLoS One. 2014 Oct 17;9(10):e108990. doi: 10.1371/journal.pone.0108990. eCollection 2014.
The personal genomics era has attracted a large amount of attention for anti-cancer therapy by patient-specific analysis. Patient-specific analysis enables discovery of individual genomic characteristics for each patient, and thus we can effectively predict individual genetic risk of disease and perform personalized anti-cancer therapy. Although the existing methods for patient-specific analysis have successfully uncovered crucial biomarkers, their performance takes a sudden turn for the worst in the presence of outliers, since the methods are based on non-robust manners. In practice, clinical and genomic alterations datasets usually contain outliers from various sources (e.g., experiment error, coding error, etc.) and the outliers may significantly affect the result of patient-specific analysis. We propose a robust methodology for patient-specific analysis in line with the NetwrokProfiler. In the proposed method, outliers in high dimensional gene expression levels and drug response datasets are simultaneously controlled by robust Mahalanobis distance in robust principal component space. Thus, we can effectively perform for predicting anti-cancer drug sensitivity and identifying sensitivity-specific biomarkers for individual patients. We observe through Monte Carlo simulations that the proposed robust method produces outstanding performances for predicting response variable in the presence of outliers. We also apply the proposed methodology to the Sanger dataset in order to uncover cancer biomarkers and predict anti-cancer drug sensitivity, and show the effectiveness of our method.
个人基因组学时代通过针对患者的分析在抗癌治疗方面引起了广泛关注。针对患者的分析能够发现每个患者的个体基因组特征,因此我们可以有效地预测个体的疾病遗传风险并进行个性化抗癌治疗。尽管现有的针对患者的分析方法已经成功地发现了关键生物标志物,但由于这些方法基于非稳健方式,在存在异常值的情况下其性能会突然变差。在实际中,临床和基因组改变数据集通常包含来自各种来源的异常值(例如,实验误差、编码错误等),并且这些异常值可能会显著影响针对患者的分析结果。我们提出了一种与网络剖析器一致的针对患者的分析的稳健方法。在所提出的方法中,高维基因表达水平和药物反应数据集中的异常值通过稳健主成分空间中的稳健马氏距离同时得到控制。因此,我们可以有效地进行预测抗癌药物敏感性并为个体患者识别敏感性特异性生物标志物。我们通过蒙特卡罗模拟观察到,所提出的稳健方法在存在异常值的情况下对于预测响应变量具有出色的性能。我们还将所提出的方法应用于桑格数据集,以发现癌症生物标志物并预测抗癌药物敏感性,并展示了我们方法的有效性。