Cheerla Nikhil, Gevaert Olivier
Monta Vista High School, Cupertino, CA, USA.
Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, and Department of Biomedical Data Science, Stanford University, 1265 Welch Rd, Stanford, CA, USA.
BMC Bioinformatics. 2017 Jan 13;18(1):32. doi: 10.1186/s12859-016-1421-y.
The current state-of-the-art in cancer diagnosis and treatment is not ideal; diagnostic tests are accurate but invasive, and treatments are "one-size fits-all" instead of being personalized. Recently, miRNA's have garnered significant attention as cancer biomarkers, owing to their ease of access (circulating miRNA in the blood) and stability. There have been many studies showing the effectiveness of miRNA data in diagnosing specific cancer types, but few studies explore the role of miRNA in predicting treatment outcome.
Here we go a step further, using tissue miRNA and clinical data across 21 cancers from the 'The Cancer Genome Atlas' (TCGA) database. We use machine learning techniques to create an accurate pan-cancer diagnosis system, and a prediction model for treatment outcomes. Finally, using these models, we create a web-based tool that diagnoses cancer and recommends the best treatment options.
We achieved 97.2% accuracy for classification using a support vector machine classifier with radial basis. The accuracies improved to 99.9-100% when climbing up the embryonic tree and classifying cancers at different stages. We define the accuracy as the ratio of the total number of instances correctly classified to the total instances. The classifier also performed well, achieving greater than 80% sensitivity for many cancer types on independent validation datasets. Many miRNAs selected by our feature selection algorithm had strong previous associations to various cancers and tumor progression. Then, using miRNA, clinical and treatment data and encoding it in a machine-learning readable format, we built a prognosis predictor model to predict the outcome of treatment with 85% accuracy. We used this model to create a tool that recommends personalized treatment regimens. Both the diagnosis and prognosis model, incorporating semi-supervised learning techniques to improve their accuracies with repeated use, were uploaded online for easy access.
Our research is a step towards the final goal of diagnosing cancer and predicting treatment recommendations using non-invasive blood tests.
癌症诊断和治疗的当前技术水平并不理想;诊断测试准确但具有侵入性,而且治疗方法是“一刀切”而非个性化的。最近,由于其易于获取(血液中的循环miRNA)和稳定性,miRNA作为癌症生物标志物受到了广泛关注。已有许多研究表明miRNA数据在诊断特定癌症类型方面的有效性,但很少有研究探讨miRNA在预测治疗结果中的作用。
在此,我们更进一步,使用来自“癌症基因组图谱”(TCGA)数据库的21种癌症的组织miRNA和临床数据。我们使用机器学习技术创建一个准确的泛癌诊断系统和一个治疗结果预测模型。最后,利用这些模型,我们创建了一个基于网络的工具,用于诊断癌症并推荐最佳治疗方案。
使用具有径向基的支持向量机分类器,我们实现了97.2%的分类准确率。当沿着决策树向上攀升并对不同阶段的癌症进行分类时,准确率提高到了99.9 - 100%。我们将准确率定义为正确分类的实例总数与总实例数的比率。该分类器在独立验证数据集上对许多癌症类型的敏感性也达到了超过80%。我们的特征选择算法选择的许多miRNA之前与各种癌症和肿瘤进展有很强的关联。然后,利用miRNA、临床和治疗数据并将其编码为机器学习可读格式,我们构建了一个预后预测模型,以85%的准确率预测治疗结果。我们使用这个模型创建了一个推荐个性化治疗方案的工具。结合半监督学习技术以通过反复使用提高其准确率的诊断和预后模型都已在线上传,以便于访问。
我们的研究朝着使用非侵入性血液检测诊断癌症并预测治疗建议这一最终目标迈出了一步。