Lanza Ben, Parashar Deepak
Statistics and Epidemiology Unit, Warwick Medical School, University of Warwick, Coventry, UK.
Warwick Cancer Research Centre, University of Warwick, Coventry, UK.
Arch Proteom Bioinform. 2021;2(1):20-38.
Biomarkers are known to be the key driver behind targeted cancer therapies by either stratifying the patients into risk categories or identifying patient subgroups most likely to benefit. However, the ability of a biomarker to stratify patients relies heavily on the type of clinical endpoint data being collected. Of particular interest is the scenario when the biomarker involved is a continuous one where the challenge is often to identify cut-offs or thresholds that would stratify the population according to the level of clinical outcome or treatment benefit. On the other hand, there are well-established Machine Learning (ML) methods such as the Support Vector Machines (SVM) that classify data, both linear as well as non-linear, into subgroups in an optimal way. SVMs have proven to be immensely useful in data-centric engineering and recently researchers have also sought its applications in healthcare. Despite their wide applicability, SVMs are not yet in the mainstream of toolkits to be utilised in observational clinical studies or in clinical trials. This research investigates the very role of SVMs in stratifying the patient population based on a continuous biomarker across a variety of datasets. Based on the mathematical framework underlying SVMs, we formulate and fit algorithms in the context of biomarker stratified cancer datasets to evaluate their merits. The analysis reveals their superior performance for certain data-types when compared to other ML methods suggesting that SVMs may have the potential to provide a robust yet simplistic solution to stratify real cancer patients based on continuous biomarkers, and hence accelerate the identification of subgroups for improved clinical outcomes or guide targeted cancer therapies.
生物标志物被认为是靶向癌症治疗背后的关键驱动因素,它可以将患者分层到不同风险类别中,或者识别出最有可能受益的患者亚组。然而,生物标志物对患者进行分层的能力在很大程度上依赖于所收集的临床终点数据的类型。特别值得关注的是这样一种情况,即所涉及的生物标志物是连续型的,此时面临的挑战通常是确定能够根据临床结果水平或治疗获益程度对人群进行分层的临界值或阈值。另一方面,有一些成熟的机器学习(ML)方法,如支持向量机(SVM),它能够以最优方式将线性和非线性数据分类到不同子组中。支持向量机已被证明在以数据为中心的工程领域非常有用,最近研究人员也在探索其在医疗保健领域的应用。尽管其适用性广泛,但支持向量机尚未成为用于观察性临床研究或临床试验的主流工具包。本研究调查了支持向量机在基于连续生物标志物对不同数据集的患者群体进行分层方面所起的作用。基于支持向量机的数学框架,我们在生物标志物分层的癌症数据集背景下制定并拟合算法,以评估它们的优点。分析表明,与其他机器学习方法相比,支持向量机在某些数据类型上具有卓越的性能,这表明支持向量机有可能提供一个强大而简单的解决方案,用于基于连续生物标志物对真实癌症患者进行分层,从而加速亚组的识别以改善临床结果或指导靶向癌症治疗。