• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过文本证据进行基因突变分类,辅助癌症肿瘤检测。

Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection.

机构信息

Department of Computer Science and Engineering, Chandigarh University, Ajitgarh, Punjab, India.

Digital Zhejiang Technology Operations Co., Ltd., Hangzhou, China.

出版信息

J Healthc Eng. 2021 Jul 27;2021:8689873. doi: 10.1155/2021/8689873. eCollection 2021.

DOI:10.1155/2021/8689873
PMID:34367540
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8337154/
Abstract

A cancer tumour consists of thousands of genetic mutations. Even after advancement in technology, the task of distinguishing genetic mutations, which act as driver for the growth of tumour with passengers (Neutral Genetic Mutations), is still being done manually. This is a time-consuming process where pathologists interpret every genetic mutation from the clinical evidence manually. These clinical shreds of evidence belong to a total of nine classes, but the criterion of classification is still unknown. The main aim of this research is to propose a multiclass classifier to classify the genetic mutations based on clinical evidence (i.e., the text description of these genetic mutations) using Natural Language Processing (NLP) techniques. The dataset for this research is taken from Kaggle and is provided by the Memorial Sloan Kettering Cancer Center (MSKCC). The world-class researchers and oncologists contribute the dataset. Three text transformation models, namely, CountVectorizer, TfidfVectorizer, and Word2Vec, are utilized for the conversion of text to a matrix of token counts. Three machine learning classification models, namely, Logistic Regression (LR), Random Forest (RF), and XGBoost (XGB), along with the Recurrent Neural Network (RNN) model of deep learning, are applied to the sparse matrix (keywords count representation) of text descriptions. The accuracy score of all the proposed classifiers is evaluated by using the confusion matrix. Finally, the empirical results show that the RNN model of deep learning has performed better than other proposed classifiers with the highest accuracy of 70%.

摘要

癌症肿瘤由数千个基因突变组成。即使在技术进步之后,区分作为肿瘤生长驱动因素的基因突变(驱动基因突变)与乘客基因突变(中性基因突变)的任务仍然需要手动完成。这是一个耗时的过程,病理学家需要手动解释来自临床证据的每一个基因突变。这些临床证据碎片属于总共九个类别,但分类标准仍然未知。这项研究的主要目的是提出一个多类分类器,使用自然语言处理(NLP)技术根据临床证据(即这些基因突变的文本描述)对基因突变进行分类。该研究的数据集来自 Kaggle,并由纪念斯隆凯特琳癌症中心(MSKCC)提供。世界级的研究人员和肿瘤学家为数据集做出了贡献。我们使用了三种文本转换模型,即计数向量器(CountVectorizer)、词频-逆文档频率向量器(TfidfVectorizer)和词向量模型(Word2Vec),将文本转换为标记计数矩阵。我们应用了三种机器学习分类模型,即逻辑回归(LR)、随机森林(RF)和 XGBoost(XGB),以及深度学习的循环神经网络(RNN)模型,到文本描述的稀疏矩阵(关键词计数表示)。通过混淆矩阵评估所有提出的分类器的准确性得分。最后,实证结果表明,深度学习的 RNN 模型的表现优于其他提出的分类器,具有最高的 70%的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/47ac0cd2d0d5/JHE2021-8689873.020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/ccea84175bd8/JHE2021-8689873.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/1edddde89a17/JHE2021-8689873.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/ea789844f81e/JHE2021-8689873.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/08a4f973dfb7/JHE2021-8689873.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/1b1ec5710495/JHE2021-8689873.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/97600548cc8f/JHE2021-8689873.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/97d5fc0be7f6/JHE2021-8689873.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/22b42f887096/JHE2021-8689873.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/c519a57a7477/JHE2021-8689873.009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/3ad9ae20eb91/JHE2021-8689873.010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/e43bdf2b1718/JHE2021-8689873.011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/1e964f3665ea/JHE2021-8689873.012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/9cca6ff64aeb/JHE2021-8689873.013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/a8d061ead1f1/JHE2021-8689873.014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/7a32b2866db9/JHE2021-8689873.015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/5c1161623e04/JHE2021-8689873.016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/d4a0ebecfb51/JHE2021-8689873.017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/3301709ecc17/JHE2021-8689873.018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/f453dbd4b349/JHE2021-8689873.019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/47ac0cd2d0d5/JHE2021-8689873.020.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/ccea84175bd8/JHE2021-8689873.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/1edddde89a17/JHE2021-8689873.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/ea789844f81e/JHE2021-8689873.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/08a4f973dfb7/JHE2021-8689873.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/1b1ec5710495/JHE2021-8689873.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/97600548cc8f/JHE2021-8689873.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/97d5fc0be7f6/JHE2021-8689873.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/22b42f887096/JHE2021-8689873.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/c519a57a7477/JHE2021-8689873.009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/3ad9ae20eb91/JHE2021-8689873.010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/e43bdf2b1718/JHE2021-8689873.011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/1e964f3665ea/JHE2021-8689873.012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/9cca6ff64aeb/JHE2021-8689873.013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/a8d061ead1f1/JHE2021-8689873.014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/7a32b2866db9/JHE2021-8689873.015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/5c1161623e04/JHE2021-8689873.016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/d4a0ebecfb51/JHE2021-8689873.017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/3301709ecc17/JHE2021-8689873.018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/f453dbd4b349/JHE2021-8689873.019.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b1e/8337154/47ac0cd2d0d5/JHE2021-8689873.020.jpg

相似文献

1
Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection.通过文本证据进行基因突变分类,辅助癌症肿瘤检测。
J Healthc Eng. 2021 Jul 27;2021:8689873. doi: 10.1155/2021/8689873. eCollection 2021.
2
Natural Language Processing for Imaging Protocol Assignment: Machine Learning for Multiclass Classification of Abdominal CT Protocols Using Indication Text Data.基于自然语言处理的成像协议分配:使用指示文本数据进行多类分类的腹部 CT 协议的机器学习。
J Digit Imaging. 2022 Oct;35(5):1120-1130. doi: 10.1007/s10278-022-00633-8. Epub 2022 Jun 2.
3
Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.评估浅层和深度学习策略在 2018 n2c2 临床文本分类共享任务中的应用。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1247-1254. doi: 10.1093/jamia/ocz149.
4
Deep-GenMut: Automated genetic mutation classification in oncology: A deep learning comparative study.深度基因变异(Deep-GenMut):肿瘤学中的自动基因突变分类:一项深度学习比较研究。
Heliyon. 2024 May 31;10(11):e32279. doi: 10.1016/j.heliyon.2024.e32279. eCollection 2024 Jun 15.
5
Classification of clinically actionable genetic mutations in cancer patients.癌症患者临床可操作基因突变的分类
Front Mol Biosci. 2024 Jan 11;10:1277862. doi: 10.3389/fmolb.2023.1277862. eCollection 2023.
6
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
7
Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques.追踪全球卫生共同财资金:使用自然语言处理技术的机器学习方法。
Front Public Health. 2022 Nov 17;10:1031147. doi: 10.3389/fpubh.2022.1031147. eCollection 2022.
8
Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks.使用多任务卷积神经网络从自由文本病理报告中自动提取癌症登记报告信息。
J Am Med Inform Assoc. 2020 Jan 1;27(1):89-98. doi: 10.1093/jamia/ocz153.
9
Hate speech detection in the Arabic language: corpus design, construction, and evaluation.阿拉伯语中的仇恨言论检测:语料库设计、构建与评估。
Front Artif Intell. 2024 Feb 20;7:1345445. doi: 10.3389/frai.2024.1345445. eCollection 2024.
10
Identification of patients with carotid stenosis using natural language processing.使用自然语言处理识别颈动脉狭窄患者。
Eur Radiol. 2020 Jul;30(7):4125-4133. doi: 10.1007/s00330-020-06721-z. Epub 2020 Feb 26.

引用本文的文献

1
Deep-GenMut: Automated genetic mutation classification in oncology: A deep learning comparative study.深度基因变异(Deep-GenMut):肿瘤学中的自动基因突变分类:一项深度学习比较研究。
Heliyon. 2024 May 31;10(11):e32279. doi: 10.1016/j.heliyon.2024.e32279. eCollection 2024 Jun 15.
2
Classification of clinically actionable genetic mutations in cancer patients.癌症患者临床可操作基因突变的分类
Front Mol Biosci. 2024 Jan 11;10:1277862. doi: 10.3389/fmolb.2023.1277862. eCollection 2023.
3
Retracted: Gene Mutation Classification through Text Evidence Facilitating Cancer Tumour Detection.

本文引用的文献

1
An Efficient Ciphertext-Policy Weighted Attribute-Based Encryption for the Internet of Health Things.面向医疗物联网的高效密文策略加权属性基加密方案
IEEE J Biomed Health Inform. 2022 May;26(5):1949-1960. doi: 10.1109/JBHI.2021.3075995. Epub 2022 May 5.
2
Gene mutation profile in patients with acquired pure red cell aplasia.获得性纯红细胞再生障碍性贫血患者的基因突变谱。
Ann Hematol. 2020 Aug;99(8):1749-1754. doi: 10.1007/s00277-020-04154-8. Epub 2020 Jun 27.
3
Comparison of deep learning models for natural language processing-based classification of non-English head CT reports.
撤回:通过文本证据进行基因突变分类以促进癌症肿瘤检测。
J Healthc Eng. 2023 Dec 6;2023:9798514. doi: 10.1155/2023/9798514. eCollection 2023.
基于深度学习的自然语言处理的非英语头部 CT 报告分类的比较。
Neuroradiology. 2020 Oct;62(10):1247-1256. doi: 10.1007/s00234-020-02420-0. Epub 2020 Apr 25.
4
Predictive Accuracy of a Polygenic Risk Score-Enhanced Prediction Model vs a Clinical Risk Score for Coronary Artery Disease.多基因风险评分增强预测模型与临床风险评分对冠状动脉疾病预测的准确性比较。
JAMA. 2020 Feb 18;323(7):636-645. doi: 10.1001/jama.2019.22241.
5
Combining gene mutation with gene expression analysis improves outcome prediction in acute promyelocytic leukemia.将基因突变与基因表达分析相结合可提高急性早幼粒细胞白血病的预后预测。
Blood. 2019 Sep 19;134(12):951-959. doi: 10.1182/blood.2019000239. Epub 2019 Jul 10.
6
Relevant Word Order Vectorization for Improved Natural Language Processing in Electronic Health Records.相关词序向量化提高电子健康记录中的自然语言处理能力。
Sci Rep. 2019 Jun 25;9(1):9253. doi: 10.1038/s41598-019-45705-y.
7
Heterozygous mutations cause genetic instability in a yeast model of cancer evolution.杂合突变导致癌症进化酵母模型中的遗传不稳定性。
Nature. 2019 Feb;566(7743):275-278. doi: 10.1038/s41586-019-0887-y. Epub 2019 Jan 30.
8
Classification of Widely and Rarely Expressed Genes with Recurrent Neural Network.基于递归神经网络的广泛表达和稀有表达基因分类
Comput Struct Biotechnol J. 2018 Dec 14;17:49-60. doi: 10.1016/j.csbj.2018.12.002. eCollection 2019.
9
Using natural language processing and machine learning to identify breast cancer local recurrence.利用自然语言处理和机器学习识别乳腺癌局部复发。
BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):498. doi: 10.1186/s12859-018-2466-x.
10
The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers.COSMIC 癌症基因目录:描述所有人类癌症中的遗传功能障碍。
Nat Rev Cancer. 2018 Nov;18(11):696-705. doi: 10.1038/s41568-018-0060-1.