Huang Cai, Mezencev Roman, McDonald John F, Vannberg Fredrik
School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, United States of America.
Parker H. Petit Institute for Bioengineering and Bioscience, Georgia Institute of Technology, Atlanta, Georgia, United States of America.
PLoS One. 2017 Oct 26;12(10):e0186906. doi: 10.1371/journal.pone.0186906. eCollection 2017.
Precision medicine is a rapidly growing area of modern medical science and open source machine-learning codes promise to be a critical component for the successful development of standardized and automated analysis of patient data. One important goal of precision cancer medicine is the accurate prediction of optimal drug therapies from the genomic profiles of individual patient tumors. We introduce here an open source software platform that employs a highly versatile support vector machine (SVM) algorithm combined with a standard recursive feature elimination (RFE) approach to predict personalized drug responses from gene expression profiles. Drug specific models were built using gene expression and drug response data from the National Cancer Institute panel of 60 human cancer cell lines (NCI-60). The models are highly accurate in predicting the drug responsiveness of a variety of cancer cell lines including those comprising the recent NCI-DREAM Challenge. We demonstrate that predictive accuracy is optimized when the learning dataset utilizes all probe-set expression values from a diversity of cancer cell types without pre-filtering for genes generally considered to be "drivers" of cancer onset/progression. Application of our models to publically available ovarian cancer (OC) patient gene expression datasets generated predictions consistent with observed responses previously reported in the literature. By making our algorithm "open source", we hope to facilitate its testing in a variety of cancer types and contexts leading to community-driven improvements and refinements in subsequent applications.
精准医学是现代医学科学中一个快速发展的领域,开源机器学习代码有望成为成功开发标准化和自动化患者数据分析的关键组成部分。精准癌症医学的一个重要目标是根据个体患者肿瘤的基因组图谱准确预测最佳药物治疗方案。我们在此介绍一个开源软件平台,该平台采用高度通用的支持向量机(SVM)算法并结合标准递归特征消除(RFE)方法,从基因表达谱预测个性化药物反应。使用来自美国国立癌症研究所60种人类癌细胞系(NCI - 60)的数据构建了药物特异性模型,这些模型在预测多种癌细胞系(包括那些构成近期NCI - DREAM挑战赛的细胞系)的药物反应性方面具有很高的准确性。我们证明,当学习数据集利用来自多种癌细胞类型的所有探针集表达值,而不对通常被认为是癌症发生/进展“驱动因素”的基因进行预过滤时,预测准确性会得到优化。将我们的模型应用于公开可用的卵巢癌(OC)患者基因表达数据集,所生成的预测结果与文献中先前报道的观察到的反应一致。通过使我们的算法“开源”,我们希望促进其在各种癌症类型和背景下的测试,从而在后续应用中实现社区驱动的改进和完善。