Tkachev Victor, Sorokin Maxim, Mescheryakov Artem, Simonov Alexander, Garazha Andrew, Buzdin Anton, Muchnik Ilya, Borisov Nicolas
Department of Bioinformatics and Molecular Networks, OmicsWay Corporation, Walnut, CA, United States.
Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Moscow, Russia.
Front Genet. 2019 Jan 15;9:717. doi: 10.3389/fgene.2018.00717. eCollection 2018.
Here, we propose a heuristic technique of data trimming for SVM termed (), tailored for personalized predictions based on molecular data. This procedure can operate with high throughput genetic datasets like gene expression or mutation profiles. Its application prevents SVM from extrapolation by excluding non-informative features. FloWPS requires training on the data for the individuals with known clinical outcomes to create a clinically relevant classifier. The genetic profiles linked with the outcomes are broken as usual into the training and validation datasets. The unique property of FloWPS is that irrelevant features in dataset that don't have significant number of neighboring hits in the dataset are removed from further analyses. Next, similarly to the nearest neighbors (kNN) method, for each point of a dataset, FloWPS takes into account only the proximal points of the dataset. Thus, for every point of a dataset, the dataset is adjusted to form a . FloWPS performance was tested on ten gene expression datasets for 992 cancer patients either responding or not on the different types of chemotherapy. We experimentally confirmed by leave-one-out cross-validation that FloWPS enables to significantly increase quality of a classifier built based on the classical SVM in most of the applications, particularly for polynomial kernels.
在此,我们提出一种用于支持向量机(SVM)的数据修剪启发式技术,称为(),它是为基于分子数据的个性化预测量身定制的。此过程可处理高通量遗传数据集,如基因表达或突变谱。其应用通过排除无信息特征来防止支持向量机进行外推。FloWPS 需要对具有已知临床结果的个体的数据进行训练,以创建一个与临床相关的分类器。与结果相关的遗传谱通常会被分为训练数据集和验证数据集。FloWPS 的独特之处在于,从进一步分析中移除了数据集中在数据集中没有大量相邻命中数的无关特征。接下来,与最近邻(kNN)方法类似,对于数据集的每个点,FloWPS 仅考虑数据集的近端点。因此,对于数据集的每个点,数据集会进行调整以形成一个。在针对 992 名癌症患者的十个基因表达数据集上测试了 FloWPS 的性能,这些患者对不同类型的化疗有反应或无反应。我们通过留一法交叉验证实验证实,在大多数应用中,特别是对于多项式核,FloWPS 能够显著提高基于经典支持向量机构建的分类器的质量。