Park Jee Soo, Choi Soo Beom, Chung Jai Won, Kim Sung Woo, Kim Deok Won
Annu Int Conf IEEE Eng Med Biol Soc. 2014;2014:3430-3. doi: 10.1109/EMBC.2014.6944360.
Ovarian cancer, the most fatal of reproductive cancers, is the fifth leading cause of death in women in the United States. Serous borderline ovarian tumors (SBOTs) are considered to be earlier or less malignant forms of serous ovarian carcinomas (SOCs). SBOTs are asymptomatic and progression to advanced stages is common. Using DNA microarray technology, we designed multicategory classification models to discriminate ovarian cancer subclasses. To develop multicategory classification models with optimal parameters and features, we systematically evaluated three machine learning algorithms and three feature selection methods using five-fold cross validation and a grid search. The study included 22 subjects with normal ovarian surface epithelial cells, 12 with SBOTs, and 79 with SOCs according to microarray data with 54,675 probe sets obtained from the National Center for Biotechnology Information gene expression omnibus repository. Application of the optimal model of support vector machines one-versus-rest with signal-to-noise as a feature selection method gave an accuracy of 97.3%, relative classifier information of 0.916, and a kappa index of 0.941. In addition, 5 features, including the expression of putative biomarkers SNTN and AOX1, were selected to differentiate between normal, SBOT, and SOC groups. An accurate diagnosis of ovarian tumor subclasses by application of multicategory machine learning would be cost-effective and simple to perform, and would ensure more effective subclass-targeted therapy.
卵巢癌是生殖系统癌症中最致命的一种,是美国女性死亡的第五大主要原因。浆液性交界性卵巢肿瘤(SBOT)被认为是浆液性卵巢癌(SOC)的早期或恶性程度较低的形式。SBOT没有症状,进展到晚期很常见。我们使用DNA微阵列技术设计了多类别分类模型来区分卵巢癌亚类。为了开发具有最佳参数和特征的多类别分类模型,我们使用五折交叉验证和网格搜索系统地评估了三种机器学习算法和三种特征选择方法。根据从美国国立生物技术信息中心基因表达综合数据库获得的包含54,675个探针集的微阵列数据,该研究纳入了22名卵巢表面上皮细胞正常的受试者、12名患有SBOT的受试者和79名患有SOC的受试者。以信噪比作为特征选择方法的支持向量机一对多最优模型的应用,准确率为97.3%,相对分类器信息为0.916,kappa指数为0.941。此外,还选择了包括假定生物标志物SNTN和AOX1表达在内的5个特征来区分正常、SBOT和SOC组。应用多类别机器学习对卵巢肿瘤亚类进行准确诊断将具有成本效益且易于实施,并将确保更有效的亚类靶向治疗。