• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从536个体细胞突变评估癌症类型的可预测性:一个新数据集。

Evaluating the Predictability of Cancer Types from 536 Somatic Mutations: A New Dataset.

作者信息

Dehkharghanian Taher, Rahnamayan Shahryar, Tizhoosh Hamid R

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:5308-5311. doi: 10.1109/EMBC44109.2020.9176699.

DOI:10.1109/EMBC44109.2020.9176699
PMID:33019182
Abstract

In this paper, we introduce a new dataset for cancer research containing somatic mutation states of 536 genes of the Cancer Gene Census (CGC). We used somatic mutation information from the Cancer Genome Atlas (TCGA) projects to create this dataset. As preliminary investigations, we employed machine learning techniques, including k-Nearest Neighbors, Decision Tree, Random Forest, and Artificial Neural Networks (ANNs) to evaluate the potential of these somatic mutations for classification of cancer types. We compared our models on accuracy, precision, recall, and F1-score. We observed that ANNs outperformed the other models with F1-score of 0.36 and overall classification accuracy of 40%, and precision ranging from 12% to 92% for different cancer types. The 40% accuracy is significantly higher than random guessing which would have resulted in 3% overall classification accuracy. Although the model has relatively low overall accuracy, it has an average classification specificity of 98%. The ANN achieved high precision scores (> 0.7) for 5 of the 33 cancer types. The introduced dataset can be used for research on TCGA data, such as survival analysis, histopathology image analysis and content-based image retrieval. The dataset is available online for download: https://kimialab.uwaterloo.ca/kimia/.

摘要

在本文中,我们引入了一个用于癌症研究的新数据集,其中包含癌症基因普查(CGC)中536个基因的体细胞突变状态。我们利用来自癌症基因组图谱(TCGA)项目的体细胞突变信息创建了这个数据集。作为初步研究,我们采用了机器学习技术,包括k近邻、决策树、随机森林和人工神经网络(ANN),来评估这些体细胞突变在癌症类型分类方面的潜力。我们在准确率、精确率、召回率和F1分数方面对我们的模型进行了比较。我们观察到,人工神经网络的表现优于其他模型,其F1分数为0.36,总体分类准确率为40%,不同癌症类型的精确率在12%至92%之间。40%的准确率显著高于随机猜测,随机猜测的总体分类准确率为3%。尽管该模型的总体准确率相对较低,但其平均分类特异性为98%。人工神经网络在33种癌症类型中的5种上取得了高精度分数(>0.7)。引入的数据集可用于对TCGA数据的研究,如生存分析、组织病理学图像分析和基于内容的图像检索。该数据集可在线下载:https://kimialab.uwaterloo.ca/kimia/ 。

相似文献

1
Evaluating the Predictability of Cancer Types from 536 Somatic Mutations: A New Dataset.从536个体细胞突变评估癌症类型的可预测性:一个新数据集。
Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:5308-5311. doi: 10.1109/EMBC44109.2020.9176699.
2
CPEM: Accurate cancer type classification based on somatic alterations using an ensemble of a random forest and a deep neural network.CPEM:基于随机森林和深度神经网络集成的体细胞改变的准确癌症类型分类。
Sci Rep. 2019 Nov 15;9(1):16927. doi: 10.1038/s41598-019-53034-3.
3
Application of supervised machine learning algorithms in the classification of sagittal gait patterns of cerebral palsy children with spastic diplegia.监督机器学习算法在痉挛性双瘫脑瘫儿童矢状面步态模式分类中的应用。
Comput Biol Med. 2019 Mar;106:33-39. doi: 10.1016/j.compbiomed.2019.01.009. Epub 2019 Jan 16.
4
Cancer Type Prediction and Classification Based on RNA-sequencing Data.基于RNA测序数据的癌症类型预测与分类
Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:5374-5377. doi: 10.1109/EMBC.2018.8513521.
5
Machine learning models in breast cancer survival prediction.用于乳腺癌生存预测的机器学习模型。
Technol Health Care. 2016;24(1):31-42. doi: 10.3233/THC-151071.
6
Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms.使用多种机器学习范例对结肠微阵列基因表达数据进行统计特征描述和分类。
Comput Methods Programs Biomed. 2019 Jul;176:173-193. doi: 10.1016/j.cmpb.2019.04.008. Epub 2019 Apr 10.
7
Transcriptome profiling by combined machine learning and statistical R analysis identifies TMEM236 as a potential novel diagnostic biomarker for colorectal cancer.联合机器学习和统计 R 分析的转录组谱分析鉴定 TMEM236 为结直肠癌的潜在新型诊断生物标志物。
Sci Rep. 2021 Jul 12;11(1):14304. doi: 10.1038/s41598-021-92692-0.
8
Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations.使用DRIVE识别癌症驱动因素:一种基于特征的机器学习模型用于体细胞错义突变的泛癌评估。
Cancers (Basel). 2021 Jun 3;13(11):2779. doi: 10.3390/cancers13112779.
9
Prediction of pathologic stage in non-small cell lung cancer using machine learning algorithm based on CT image feature analysis.基于 CT 图像特征分析的机器学习算法预测非小细胞肺癌病理分期。
BMC Cancer. 2019 May 17;19(1):464. doi: 10.1186/s12885-019-5646-9.
10
Optimisation of cancer classification by machine learning generates an enriched list of candidate drug targets and biomarkers.通过机器学习优化癌症分类,生成了一组候选药物靶点和生物标志物的富集列表。
Mol Omics. 2020 Apr 1;16(2):113-125. doi: 10.1039/c9mo00198k. Epub 2020 Feb 25.