Suppr超能文献

基于临床证据的基因分类中 BERT 的应用

Application of BERT to Enable Gene Classification Based on Clinical Evidence.

机构信息

National Pilot School of Software, Yunnan University, Kunming, 650091, China.

Department of Mathematics, The Ohio State University, Columbus, OH 43210, USA.

出版信息

Biomed Res Int. 2020 Oct 7;2020:5491963. doi: 10.1155/2020/5491963. eCollection 2020.

Abstract

The identification of profiled cancer-related genes plays an essential role in cancer diagnosis and treatment. Based on literature research, the classification of genetic mutations continues to be done manually nowadays. Manual classification of genetic mutations is pathologist-dependent, subjective, and time-consuming. To improve the accuracy of clinical interpretation, scientists have proposed computational-based approaches for automatic analysis of mutations with the advent of next-generation sequencing technologies. Nevertheless, some challenges, such as multiple classifications, the complexity of texts, redundant descriptions, and inconsistent interpretation, have limited the development of algorithms. To overcome these difficulties, we have adapted a deep learning method named Bidirectional Encoder Representations from Transformers (BERT) to classify genetic mutations based on text evidence from an annotated database. During the training, three challenging features such as the extreme length of texts, biased data presentation, and high repeatability were addressed. Finally, the BERT+abstract demonstrates satisfactory results with 0.80 logarithmic loss, 0.6837 recall, and 0.705 -measure. It is feasible for BERT to classify the genomic mutation text within literature-based datasets. Consequently, BERT is a practical tool for facilitating and significantly speeding up cancer research towards tumor progression, diagnosis, and the design of more precise and effective treatments.

摘要

鉴定与癌症相关的基因在癌症的诊断和治疗中起着至关重要的作用。基于文献研究,目前仍然是人工对基因突变进行分类。基因突变的人工分类依赖于病理学家,具有主观性和耗时的特点。为了提高临床解释的准确性,随着下一代测序技术的出现,科学家们提出了基于计算的方法来自动分析突变。然而,一些挑战,如多种分类、文本的复杂性、冗余描述和不一致的解释,限制了算法的发展。为了克服这些困难,我们采用了一种名为 BERT(Bidirectional Encoder Representations from Transformers)的深度学习方法,根据注释数据库中的文本证据对基因突变进行分类。在训练过程中,我们解决了三个具有挑战性的问题,包括文本的极端长度、数据表示的偏差和高度重复性。最后,BERT+abstract 表现出了令人满意的结果,对数损失为 0.80,召回率为 0.6837,F1 值为 0.705。BERT 可以对基于文献的数据集内的基因组突变文本进行分类。因此,BERT 是一个实用的工具,可以促进癌症研究的发展,加速肿瘤的进展、诊断和更精确、有效的治疗方法的设计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9759/7563092/0edfd3867c16/BMRI2020-5491963.001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验