Suppr超能文献

使用监督式机器学习算法实现轻松癌症分类的决策支持系统与网络应用程序。

Decision Support System and Web-Application Using Supervised Machine Learning Algorithms for Easy Cancer Classifications.

作者信息

Chandrashekar K, Setlur Anagha S, Sabhapathi C Adithya, Raiker Satyam Suresh, Singh Satyam, Niranjan Vidya

机构信息

Department of Biotechnology, R V College of Engineering, Bengaluru, Karnataka, India.

出版信息

Cancer Inform. 2023 Jan 23;22:11769351221147244. doi: 10.1177/11769351221147244. eCollection 2023.

Abstract

Using a decision support system (DSS) that classifies various cancers provides support to the clinicians/researchers to make better decisions that can aid in early cancer diagnosis, thereby reducing chances of incorrect disease diagnosis. Thus, this work aimed at designing a classification model that can predict accurately for 5 different cancer types comprising of 20 cancer exomes, using the mutations identified from whole exome cancer analysis. Initially, a basic model was designed using supervised machine learning classification algorithms such as K-nearest neighbor (KNN), support vector machine (SVM), decision tree, naïve bayes and random forest (RF), among which decision tree and random forest performed better in terms of preliminary model accuracy. However, output predictions were incorrect due to less training scores. Thus, 16 essential features were then selected for model improvement using 2 approaches. All imbalanced datasets were balanced using SMOTE. In the first approach, all features from 20 cancer exome datasets were trained and models were designed using decision tree and random forest. Balanced datasets for decision tree model showed an accuracy of 77%, while with the RF model, the accuracy improved to 82% where all 5 cancer types were predicted correctly. Area under the curve for RF model was closer to 1, than decision tree model. In the second approach, all 15 datasets were trained, while 5 were tested. However, only 2 cancer types were predicted correctly. To cross validate RF model, Matthew's correlation co-efficient (MCC) test was performed. For method 1, the MCC test and MCC cross validation was found to be 0.7796 and 0.9356 respectively. Likewise, for second approach, MCC was observed to be 0.9365, corroborating the accuracy of the designed model. The model was successfully deployed using Streamlit as a web application for easy use. This study presents insights for allowing easy cancer classifications.

摘要

使用一个对各种癌症进行分类的决策支持系统(DSS),可为临床医生/研究人员提供支持,以做出有助于早期癌症诊断的更好决策,从而减少疾病误诊的几率。因此,这项工作旨在设计一种分类模型,该模型可以利用从全外显子组癌症分析中识别出的突变,对包含20个癌症外显子组的5种不同癌症类型进行准确预测。最初,使用监督机器学习分类算法(如K近邻算法(KNN)、支持向量机(SVM)、决策树、朴素贝叶斯和随机森林(RF))设计了一个基本模型,其中决策树和随机森林在初步模型准确性方面表现更好。然而,由于训练分数较低,输出预测不正确。因此,随后使用两种方法选择了16个基本特征来改进模型。使用SMOTE对所有不平衡数据集进行了平衡处理。在第一种方法中,对来自20个癌症外显子组数据集的所有特征进行训练,并使用决策树和随机森林设计模型。决策树模型的平衡数据集显示准确率为77%,而对于随机森林模型,准确率提高到82%,所有5种癌症类型均被正确预测。随机森林模型的曲线下面积比决策树模型更接近1。在第二种方法中,对所有15个数据集进行训练,同时对5个数据集进行测试。然而,仅正确预测了2种癌症类型。为了对随机森林模型进行交叉验证,进行了马修斯相关系数(MCC)测试。对于方法1,发现MCC测试和MCC交叉验证分别为0.7796和0.9356。同样,对于第二种方法,观察到MCC为0.9365,证实了所设计模型的准确性。该模型已使用Streamlit成功部署为一个易于使用的Web应用程序。本研究为实现轻松的癌症分类提供了见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/2458a7a0ab42/10.1177_11769351221147244-fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验