• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用监督式机器学习算法实现轻松癌症分类的决策支持系统与网络应用程序。

Decision Support System and Web-Application Using Supervised Machine Learning Algorithms for Easy Cancer Classifications.

作者信息

Chandrashekar K, Setlur Anagha S, Sabhapathi C Adithya, Raiker Satyam Suresh, Singh Satyam, Niranjan Vidya

机构信息

Department of Biotechnology, R V College of Engineering, Bengaluru, Karnataka, India.

出版信息

Cancer Inform. 2023 Jan 23;22:11769351221147244. doi: 10.1177/11769351221147244. eCollection 2023.

DOI:10.1177/11769351221147244
PMID:36714384
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9880585/
Abstract

Using a decision support system (DSS) that classifies various cancers provides support to the clinicians/researchers to make better decisions that can aid in early cancer diagnosis, thereby reducing chances of incorrect disease diagnosis. Thus, this work aimed at designing a classification model that can predict accurately for 5 different cancer types comprising of 20 cancer exomes, using the mutations identified from whole exome cancer analysis. Initially, a basic model was designed using supervised machine learning classification algorithms such as K-nearest neighbor (KNN), support vector machine (SVM), decision tree, naïve bayes and random forest (RF), among which decision tree and random forest performed better in terms of preliminary model accuracy. However, output predictions were incorrect due to less training scores. Thus, 16 essential features were then selected for model improvement using 2 approaches. All imbalanced datasets were balanced using SMOTE. In the first approach, all features from 20 cancer exome datasets were trained and models were designed using decision tree and random forest. Balanced datasets for decision tree model showed an accuracy of 77%, while with the RF model, the accuracy improved to 82% where all 5 cancer types were predicted correctly. Area under the curve for RF model was closer to 1, than decision tree model. In the second approach, all 15 datasets were trained, while 5 were tested. However, only 2 cancer types were predicted correctly. To cross validate RF model, Matthew's correlation co-efficient (MCC) test was performed. For method 1, the MCC test and MCC cross validation was found to be 0.7796 and 0.9356 respectively. Likewise, for second approach, MCC was observed to be 0.9365, corroborating the accuracy of the designed model. The model was successfully deployed using Streamlit as a web application for easy use. This study presents insights for allowing easy cancer classifications.

摘要

使用一个对各种癌症进行分类的决策支持系统(DSS),可为临床医生/研究人员提供支持,以做出有助于早期癌症诊断的更好决策,从而减少疾病误诊的几率。因此,这项工作旨在设计一种分类模型,该模型可以利用从全外显子组癌症分析中识别出的突变,对包含20个癌症外显子组的5种不同癌症类型进行准确预测。最初,使用监督机器学习分类算法(如K近邻算法(KNN)、支持向量机(SVM)、决策树、朴素贝叶斯和随机森林(RF))设计了一个基本模型,其中决策树和随机森林在初步模型准确性方面表现更好。然而,由于训练分数较低,输出预测不正确。因此,随后使用两种方法选择了16个基本特征来改进模型。使用SMOTE对所有不平衡数据集进行了平衡处理。在第一种方法中,对来自20个癌症外显子组数据集的所有特征进行训练,并使用决策树和随机森林设计模型。决策树模型的平衡数据集显示准确率为77%,而对于随机森林模型,准确率提高到82%,所有5种癌症类型均被正确预测。随机森林模型的曲线下面积比决策树模型更接近1。在第二种方法中,对所有15个数据集进行训练,同时对5个数据集进行测试。然而,仅正确预测了2种癌症类型。为了对随机森林模型进行交叉验证,进行了马修斯相关系数(MCC)测试。对于方法1,发现MCC测试和MCC交叉验证分别为0.7796和0.9356。同样,对于第二种方法,观察到MCC为0.9365,证实了所设计模型的准确性。该模型已使用Streamlit成功部署为一个易于使用的Web应用程序。本研究为实现轻松的癌症分类提供了见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/654fab7ccbde/10.1177_11769351221147244-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/2458a7a0ab42/10.1177_11769351221147244-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/bf06c623ba38/10.1177_11769351221147244-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/84ee3a303ec3/10.1177_11769351221147244-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/de4ce13feb2b/10.1177_11769351221147244-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/c1e9cd782419/10.1177_11769351221147244-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/fafe49fe577c/10.1177_11769351221147244-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/1907157a814b/10.1177_11769351221147244-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/654fab7ccbde/10.1177_11769351221147244-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/2458a7a0ab42/10.1177_11769351221147244-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/bf06c623ba38/10.1177_11769351221147244-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/84ee3a303ec3/10.1177_11769351221147244-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/de4ce13feb2b/10.1177_11769351221147244-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/c1e9cd782419/10.1177_11769351221147244-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/fafe49fe577c/10.1177_11769351221147244-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/1907157a814b/10.1177_11769351221147244-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/782e/9880585/654fab7ccbde/10.1177_11769351221147244-fig8.jpg

相似文献

1
Decision Support System and Web-Application Using Supervised Machine Learning Algorithms for Easy Cancer Classifications.使用监督式机器学习算法实现轻松癌症分类的决策支持系统与网络应用程序。
Cancer Inform. 2023 Jan 23;22:11769351221147244. doi: 10.1177/11769351221147244. eCollection 2023.
2
Application of supervised machine learning algorithms for classification and prediction of type-2 diabetes disease status in Afar regional state, Northeastern Ethiopia 2021.2021 年,埃塞俄比亚东北部阿法尔地区使用监督机器学习算法对 2 型糖尿病疾病状况进行分类和预测。
Sci Rep. 2023 May 13;13(1):7779. doi: 10.1038/s41598-023-34906-1.
3
Application of supervised machine learning algorithms in the classification of sagittal gait patterns of cerebral palsy children with spastic diplegia.监督机器学习算法在痉挛性双瘫脑瘫儿童矢状面步态模式分类中的应用。
Comput Biol Med. 2019 Mar;106:33-39. doi: 10.1016/j.compbiomed.2019.01.009. Epub 2019 Jan 16.
4
Implementation of ensemble machine learning algorithms on exome datasets for predicting early diagnosis of cancers.基于外显子组数据集的集成机器学习算法在癌症早期诊断预测中的应用。
BMC Bioinformatics. 2022 Nov 18;23(1):496. doi: 10.1186/s12859-022-05050-w.
5
Comparing supervised and semi-supervised Machine Learning Models on Diagnosing Breast Cancer.比较监督式和半监督式机器学习模型在乳腺癌诊断中的应用
Ann Med Surg (Lond). 2021 Jan 8;62:53-64. doi: 10.1016/j.amsu.2020.12.043. eCollection 2021 Feb.
6
Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.使用健身数据比较机器学习技术预测全因死亡率:亨利福特锻炼测试(FIT)项目。
BMC Med Inform Decis Mak. 2017 Dec 19;17(1):174. doi: 10.1186/s12911-017-0566-6.
7
Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines.基于最近邻算法和支持向量机的不平衡数据在人类乳腺癌和结肠癌预测中的应用。
Comput Methods Programs Biomed. 2014 Mar;113(3):792-808. doi: 10.1016/j.cmpb.2014.01.001. Epub 2014 Jan 10.
8
Heart disease prediction using supervised machine learning algorithms: Performance analysis and comparison.基于监督机器学习算法的心脏病预测:性能分析与比较。
Comput Biol Med. 2021 Sep;136:104672. doi: 10.1016/j.compbiomed.2021.104672. Epub 2021 Jul 21.
9
Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing.医疗保健中使用合成数据的监督式机器学习的可靠性:用于数据共享时保护隐私的模型
JMIR Med Inform. 2020 Jul 20;8(7):e18910. doi: 10.2196/18910.
10
Comparison of Supervised Machine Learning Algorithms for Classifying of Home Discharge Possibility in Convalescent Stroke Patients: A Secondary Analysis.基于机器学习的监督算法在恢复期脑卒中患者居家康复可能性分类中的比较:二次分析。
J Stroke Cerebrovasc Dis. 2021 Oct;30(10):106011. doi: 10.1016/j.jstrokecerebrovasdis.2021.106011. Epub 2021 Jul 26.

引用本文的文献

1
Development of a liver graft assessment expert machine-learning system: when the artificial intelligence helps liver transplant surgeons.肝移植评估专家机器学习系统的开发:人工智能如何助力肝移植外科医生
Front Surg. 2023 Sep 22;10:1048451. doi: 10.3389/fsurg.2023.1048451. eCollection 2023.

本文引用的文献

1
Implementation of ensemble machine learning algorithms on exome datasets for predicting early diagnosis of cancers.基于外显子组数据集的集成机器学习算法在癌症早期诊断预测中的应用。
BMC Bioinformatics. 2022 Nov 18;23(1):496. doi: 10.1186/s12859-022-05050-w.
2
MutaXome: A Novel Database for Identified Somatic Variations of Analyzed Cancer Exome Datasets.突变体基因组:一个用于分析癌症外显子组数据集已识别体细胞变异的新型数据库。
Cancer Inform. 2022 May 13;21:11769351221097593. doi: 10.1177/11769351221097593. eCollection 2022.
3
The Tumor Profiler Study: integrated, multi-omic, functional tumor profiling for clinical decision support.
肿瘤分析研究:综合、多组学、功能肿瘤分析,为临床决策提供支持。
Cancer Cell. 2021 Mar 8;39(3):288-293. doi: 10.1016/j.ccell.2021.01.004. Epub 2021 Jan 21.
4
Systems Biology and Experimental Model Systems of Cancer.癌症的系统生物学与实验模型系统
J Pers Med. 2020 Oct 19;10(4):180. doi: 10.3390/jpm10040180.
5
Integrated Informatics Analysis of Cancer-Related Variants.癌症相关变异的综合信息学分析
JCO Clin Cancer Inform. 2020 Mar;4:310-317. doi: 10.1200/CCI.19.00132.
6
The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.马修斯相关系数(MCC)在二分类评估中优于 F1 得分和准确率的优势。
BMC Genomics. 2020 Jan 2;21(1):6. doi: 10.1186/s12864-019-6413-7.
7
A Pan-Cancer Approach to Predict Responsiveness to Immune Checkpoint Inhibitors by Machine Learning.一种通过机器学习预测对免疫检查点抑制剂反应性的泛癌方法。
Cancers (Basel). 2019 Oct 15;11(10):1562. doi: 10.3390/cancers11101562.
8
An Online Calculator for the Prediction of Survival in Glioblastoma Patients Using Classical Statistics and Machine Learning.一种使用经典统计学和机器学习预测胶质母细胞瘤患者生存情况的在线计算器。
Neurosurgery. 2020 Feb 1;86(2):E184-E192. doi: 10.1093/neuros/nyz403.
9
Artificial Intelligence and Machine Learning in Pathology: The Present Landscape of Supervised Methods.病理学中的人工智能与机器学习:监督方法的现状
Acad Pathol. 2019 Sep 3;6:2374289519873088. doi: 10.1177/2374289519873088. eCollection 2019 Jan-Dec.
10
Sentieon DNASeq Variant Calling Workflow Demonstrates Strong Computational Performance and Accuracy.Sentieon DNASeq变异检测工作流程展现出强大的计算性能和准确性。
Front Genet. 2019 Aug 20;10:736. doi: 10.3389/fgene.2019.00736. eCollection 2019.