基于临床证据的基因分类中 BERT 的应用

Application of BERT to Enable Gene Classification Based on Clinical Evidence.

机构信息

National Pilot School of Software, Yunnan University, Kunming, 650091, China.

Department of Mathematics, The Ohio State University, Columbus, OH 43210, USA.

出版信息

Biomed Res Int. 2020 Oct 7;2020:5491963. doi: 10.1155/2020/5491963. eCollection 2020.

DOI:10.1155/2020/5491963

PMID:33083472

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7563092/

Abstract

The identification of profiled cancer-related genes plays an essential role in cancer diagnosis and treatment. Based on literature research, the classification of genetic mutations continues to be done manually nowadays. Manual classification of genetic mutations is pathologist-dependent, subjective, and time-consuming. To improve the accuracy of clinical interpretation, scientists have proposed computational-based approaches for automatic analysis of mutations with the advent of next-generation sequencing technologies. Nevertheless, some challenges, such as multiple classifications, the complexity of texts, redundant descriptions, and inconsistent interpretation, have limited the development of algorithms. To overcome these difficulties, we have adapted a deep learning method named Bidirectional Encoder Representations from Transformers (BERT) to classify genetic mutations based on text evidence from an annotated database. During the training, three challenging features such as the extreme length of texts, biased data presentation, and high repeatability were addressed. Finally, the BERT+abstract demonstrates satisfactory results with 0.80 logarithmic loss, 0.6837 recall, and 0.705 -measure. It is feasible for BERT to classify the genomic mutation text within literature-based datasets. Consequently, BERT is a practical tool for facilitating and significantly speeding up cancer research towards tumor progression, diagnosis, and the design of more precise and effective treatments.

摘要

鉴定与癌症相关的基因在癌症的诊断和治疗中起着至关重要的作用。基于文献研究，目前仍然是人工对基因突变进行分类。基因突变的人工分类依赖于病理学家，具有主观性和耗时的特点。为了提高临床解释的准确性，随着下一代测序技术的出现，科学家们提出了基于计算的方法来自动分析突变。然而，一些挑战，如多种分类、文本的复杂性、冗余描述和不一致的解释，限制了算法的发展。为了克服这些困难，我们采用了一种名为 BERT（Bidirectional Encoder Representations from Transformers）的深度学习方法，根据注释数据库中的文本证据对基因突变进行分类。在训练过程中，我们解决了三个具有挑战性的问题，包括文本的极端长度、数据表示的偏差和高度重复性。最后，BERT+abstract 表现出了令人满意的结果，对数损失为 0.80，召回率为 0.6837，F1 值为 0.705。BERT 可以对基于文献的数据集内的基因组突变文本进行分类。因此，BERT 是一个实用的工具，可以促进癌症研究的发展，加速肿瘤的进展、诊断和更精确、有效的治疗方法的设计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9759/7563092/0edfd3867c16/BMRI2020-5491963.001.jpg

相似文献

Application of BERT to Enable Gene Classification Based on Clinical Evidence.基于临床证据的基因分类中 BERT 的应用

Biomed Res Int. 2020 Oct 7;2020:5491963. doi: 10.1155/2020/5491963. eCollection 2020.

Deep-GenMut: Automated genetic mutation classification in oncology: A deep learning comparative study.深度基因变异（Deep-GenMut）：肿瘤学中的自动基因突变分类：一项深度学习比较研究。

Heliyon. 2024 May 31;10(11):e32279. doi: 10.1016/j.heliyon.2024.e32279. eCollection 2024 Jun 15.

Vaccine sentiment analysis using BERT + NBSVM and geo-spatial approaches.使用BERT + NBSVM和地理空间方法的疫苗情绪分析。

J Supercomput. 2023 May 7:1-31. doi: 10.1007/s11227-023-05319-8.

Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。

Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.

Use of BERT (Bidirectional Encoder Representations from Transformers)-Based Deep Learning Method for Extracting Evidences in Chinese Radiology Reports: Development of a Computer-Aided Liver Cancer Diagnosis Framework.基于 BERT（来自 Transformers 的双向编码器表示）的深度学习方法在提取中文放射学报告证据中的应用：计算机辅助肝癌诊断框架的开发。

J Med Internet Res. 2021 Jan 12;23(1):e19689. doi: 10.2196/19689.

Multi-Label Classification in Patient-Doctor Dialogues With the RoBERTa-WWM-ext + CNN (Robustly Optimized Bidirectional Encoder Representations From Transformers Pretraining Approach With Whole Word Masking Extended Combining a Convolutional Neural Network) Model: Named Entity Study.基于RoBERTa-WWM-ext + CNN（带有全词掩码扩展的基于变换器预训练方法的稳健优化双向编码器表示与卷积神经网络相结合）模型的医患对话多标签分类：命名实体研究

JMIR Med Inform. 2022 Apr 21;10(4):e35606. doi: 10.2196/35606.

A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.深度学习模型在不同类别不平衡程度的非结构化医疗记录文本分类中的对比研究。

BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y.

Modified Bidirectional Encoder Representations From Transformers Extractive Summarization Model for Hospital Information Systems Based on Character-Level Tokens (AlphaBERT): Development and Performance Evaluation.基于字符级令牌的医院信息系统变压器抽取式摘要模型（AlphaBERT）的改进双向编码器表示：开发与性能评估

JMIR Med Inform. 2020 Apr 29;8(4):e17787. doi: 10.2196/17787.

Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT).使用基于转换器的双向编码器表示 (BERT) 和领域内预训练 (IDPT) 对耳鸣患者的可操作放射学报告进行自动文本分类。

BMC Med Inform Decis Mak. 2022 Jul 30;22(1):200. doi: 10.1186/s12911-022-01946-y.

BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information.BERT-m7G：一种基于 BERT 和堆叠集成的转换器架构，用于从序列信息中识别 RNA N7-甲基鸟苷位点。

Comput Math Methods Med. 2021 Aug 25;2021:7764764. doi: 10.1155/2021/7764764. eCollection 2021.

引用本文的文献

Heliyon. 2024 May 31;10(11):e32279. doi: 10.1016/j.heliyon.2024.e32279. eCollection 2024 Jun 15.

Classification of clinically actionable genetic mutations in cancer patients.癌症患者临床可操作基因突变的分类

Front Mol Biosci. 2024 Jan 11;10:1277862. doi: 10.3389/fmolb.2023.1277862. eCollection 2023.

Text-Mining Approach to Identify Hub Genes of Cancer Metastasis and Potential Drug Repurposing to Target Them.用于识别癌症转移枢纽基因及靶向这些基因的潜在药物再利用的文本挖掘方法。

J Clin Med. 2022 Apr 11;11(8):2130. doi: 10.3390/jcm11082130.

COVID-19 sentiment analysis via deep learning during the rise of novel cases.基于新发病例的深度学习进行 COVID-19 情绪分析。

PLoS One. 2021 Aug 19;16(8):e0255615. doi: 10.1371/journal.pone.0255615. eCollection 2021.

本文引用的文献

A Survey of the Usages of Deep Learning for Natural Language Processing.深度学习在自然语言处理中的应用调查。

IEEE Trans Neural Netw Learn Syst. 2021 Feb;32(2):604-624. doi: 10.1109/TNNLS.2020.2979670. Epub 2021 Feb 4.

An introduction to deep learning in medical physics: advantages, potential, and challenges.深度学习在医学物理学中的应用：优势、潜力和挑战。

Phys Med Biol. 2020 Mar 3;65(5):05TR01. doi: 10.1088/1361-6560/ab6f51.

A Reliable Multi-classifier Multi-objective Model for Predicting Recurrence in Triple Negative Breast Cancer.一种用于预测三阴性乳腺癌复发的可靠多分类器多目标模型。

Annu Int Conf IEEE Eng Med Biol Soc. 2019 Jul;2019:2182-2185. doi: 10.1109/EMBC.2019.8857030.

Exosomes: A Novel Therapeutic Agent for Cartilage and Bone Tissue Regeneration.外泌体：一种用于软骨和骨组织再生的新型治疗剂。

Dose Response. 2019 Dec 13;17(4):1559325819892702. doi: 10.1177/1559325819892702. eCollection 2019 Oct-Dec.

Large-scale generation of functional mRNA-encapsulating exosomes via cellular nanoporation.通过细胞纳米穿孔术大规模生成功能性包裹 mRNA 的外泌体。

Nat Biomed Eng. 2020 Jan;4(1):69-83. doi: 10.1038/s41551-019-0485-1. Epub 2019 Dec 16.

Isolation and Detection Technologies of Extracellular Vesicles and Application on Cancer Diagnostic.细胞外囊泡的分离与检测技术及其在癌症诊断中的应用

Dose Response. 2019 Dec 9;17(4):1559325819891004. doi: 10.1177/1559325819891004. eCollection 2019 Oct-Dec.

Multi-objective ensemble deep learning using electronic health records to predict outcomes after lung cancer radiotherapy.基于电子健康记录的多目标集成深度学习预测肺癌放疗后结局。

Phys Med Biol. 2019 Dec 13;64(24):245005. doi: 10.1088/1361-6560/ab555e.

Extractive summarization of clinical trial descriptions.临床试验描述的抽取式总结。

Int J Med Inform. 2019 Sep;129:114-121. doi: 10.1016/j.ijmedinf.2019.05.019. Epub 2019 May 30.

SANAD: Single-label Arabic News Articles Dataset for automatic text categorization.SANAD：用于自动文本分类的单标签阿拉伯语新闻文章数据集。

Data Brief. 2019 Jun 4;25:104076. doi: 10.1016/j.dib.2019.104076. eCollection 2019 Aug.

Extracellular vesicles as mediators of in vitro neutrophil swarming on a large-scale microparticle array.细胞外囊泡作为体外大规模微颗粒阵列上中性粒细胞游走的介质。

Lab Chip. 2019 Sep 7;19(17):2874-2884. doi: 10.1039/c9lc00483a. Epub 2019 Jul 25.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于临床证据的基因分类中 BERT 的应用

Application of BERT to Enable Gene Classification Based on Clinical Evidence.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献