Oh Jung Hun, Gao Jean, Nandi Animesh, Gurnani Prem, Knowles Lynne, Schorge John
Department of Computer Science and Engineering, The University of Texas, Arlington, 76019, USA.
Genome Inform. 2005;16(2):195-204.
Surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry data has been increasingly analyzed for identifying biomarkers to help early detection of the disease. Ovarian cancer commonly recurs at the rate of 75% within a few months or several years later after standard treatment. Since recurrent ovarian cancer is relatively difficult to be diagnosed and small tumors generally respond better to treatment, new methods for the detection of early relapse in ovarian cancer are urgently needed. Here, we propose a new algorithm SVM-MB/RFE (SVM-Markov Blanket/Recursive Feature Elimination) based on SVM-RFE, which identifies biomarkers for predicting the early recurrence of ovarian cancer. In this approach, we first apply t-test for feature pruning and then binning using 5-fold cross validation. Finally, 58 peaks are obtained from 27,000 of the raw data. Such dramatically reduced features relax the computational burden in the next step of our algorithm. We compare the performance of three feature selection algorithms and demonstrate that SVM-MB/RFE outperforms other methods.
表面增强激光解吸/电离飞行时间(SELDI-TOF)质谱数据已被越来越多地用于分析,以识别生物标志物,帮助疾病的早期检测。卵巢癌在标准治疗后的几个月或几年内,通常以75%的复发率复发。由于复发性卵巢癌相对难以诊断,且小肿瘤通常对治疗反应更好,因此迫切需要新的方法来检测卵巢癌的早期复发。在此,我们基于支持向量机递归特征消除(SVM-RFE)提出了一种新算法SVM-MB/RFE(支持向量机-马尔可夫毯/递归特征消除),该算法可识别预测卵巢癌早期复发的生物标志物。在这种方法中,我们首先应用t检验进行特征修剪,然后使用5折交叉验证进行分箱。最后,从27000个原始数据中获得了58个峰。如此大幅减少的特征减轻了我们算法下一步的计算负担。我们比较了三种特征选择算法的性能,并证明SVM-MB/RFE优于其他方法。