Suppr超能文献

基于机器学习的乳腺癌类型分类

Breast Cancer Type Classification Using Machine Learning.

作者信息

Wu Jiande, Hicks Chindo

机构信息

Department of Genetics, School of Medicine, Louisiana State University Health Sciences Center, 533 Bolivar, New Orleans, LA 70112, USA.

出版信息

J Pers Med. 2021 Jan 20;11(2):61. doi: 10.3390/jpm11020061.

Abstract

BACKGROUND

Breast cancer is a heterogeneous disease defined by molecular types and subtypes. Advances in genomic research have enabled use of precision medicine in clinical management of breast cancer. A critical unmet medical need is distinguishing triple negative breast cancer, the most aggressive and lethal form of breast cancer, from non-triple negative breast cancer. Here we propose use of a machine learning (ML) approach for classification of triple negative breast cancer and non-triple negative breast cancer patients using gene expression data.

METHODS

We performed analysis of RNA-Sequence data from 110 triple negative and 992 non-triple negative breast cancer tumor samples from The Cancer Genome Atlas to select the features (genes) used in the development and validation of the classification models. We evaluated four different classification models including Support Vector Machines, K-nearest neighbor, Naïve Bayes and Decision tree using features selected at different threshold levels to train the models for classifying the two types of breast cancer. For performance evaluation and validation, the proposed methods were applied to independent gene expression datasets.

RESULTS

Among the four ML algorithms evaluated, the Support Vector Machine algorithm was able to classify breast cancer more accurately into triple negative and non-triple negative breast cancer and had less misclassification errors than the other three algorithms evaluated.

CONCLUSIONS

The prediction results show that ML algorithms are efficient and can be used for classification of breast cancer into triple negative and non-triple negative breast cancer types.

摘要

背景

乳腺癌是一种由分子类型和亚型定义的异质性疾病。基因组研究的进展使得精准医学能够应用于乳腺癌的临床管理。一个关键的未满足医疗需求是区分三阴性乳腺癌(最具侵袭性和致命性的乳腺癌形式)和非三阴性乳腺癌。在此,我们提出使用机器学习(ML)方法,利用基因表达数据对三阴性乳腺癌患者和非三阴性乳腺癌患者进行分类。

方法

我们对来自癌症基因组图谱的110例三阴性乳腺癌肿瘤样本和992例非三阴性乳腺癌肿瘤样本的RNA序列数据进行分析,以选择用于分类模型开发和验证的特征(基因)。我们使用在不同阈值水平选择的特征评估了四种不同的分类模型,包括支持向量机、K近邻、朴素贝叶斯和决策树,以训练用于区分这两种类型乳腺癌的模型。为了进行性能评估和验证,将所提出的方法应用于独立的基因表达数据集。

结果

在所评估的四种机器学习算法中,支持向量机算法能够更准确地将乳腺癌分类为三阴性和非三阴性乳腺癌,并且与所评估的其他三种算法相比,误分类错误更少。

结论

预测结果表明,机器学习算法是有效的,可用于将乳腺癌分类为三阴性和非三阴性乳腺癌类型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0f0b/7909418/ef2eb6225055/jpm-11-00061-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验