• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

无监督和自监督深度学习方法在生物医学文本挖掘中的应用。

Unsupervised and self-supervised deep learning approaches for biomedical text mining.

机构信息

Université de Paris, CNRS, Centre Borelli, France.

出版信息

Brief Bioinform. 2021 Mar 22;22(2):1592-1603. doi: 10.1093/bib/bbab016.

DOI:10.1093/bib/bbab016
PMID:33569575
Abstract

Biomedical scientific literature is growing at a very rapid pace, which makes increasingly difficult for human experts to spot the most relevant results hidden in the papers. Automatized information extraction tools based on text mining techniques are therefore needed to assist them in this task. In the last few years, deep neural networks-based techniques have significantly contributed to advance the state-of-the-art in this research area. Although the contribution to this progress made by supervised methods is relatively well-known, this is less so for other kinds of learning, namely unsupervised and self-supervised learning. Unsupervised learning is a kind of learning that does not require the cost of creating labels, which is very useful in the exploratory stages of a biomedical study where agile techniques are needed to rapidly explore many paths. In particular, clustering techniques applied to biomedical text mining allow to gather large sets of documents into more manageable groups. Deep learning techniques have allowed to produce new clustering-friendly representations of the data. On the other hand, self-supervised learning is a kind of supervised learning where the labels do not have to be manually created by humans, but are automatically derived from relations found in the input texts. In combination with innovative network architectures (e.g. transformer-based architectures), self-supervised techniques have allowed to design increasingly effective vector-based word representations (word embeddings). We show in this survey how word representations obtained in this way have proven to successfully interact with common supervised modules (e.g. classification networks) to whose performance they greatly contribute.

摘要

生物医学科学文献的增长速度非常快,这使得人类专家越来越难以发现隐藏在论文中的最相关结果。因此,需要基于文本挖掘技术的自动化信息提取工具来协助他们完成这项任务。在过去的几年中,基于深度神经网络的技术在该研究领域的最新技术方面做出了重大贡献。尽管监督方法对这一进展的贡献相对为人所知,但其他类型的学习,即无监督学习和自监督学习,却鲜为人知。无监督学习是一种不需要创建标签成本的学习,在生物医学研究的探索阶段非常有用,因为需要敏捷的技术来快速探索许多路径。特别是应用于生物医学文本挖掘的聚类技术可以将大量文档汇集到更易于管理的组中。深度学习技术允许对数据进行新的聚类友好表示。另一方面,自监督学习是一种监督学习,其中标签不必由人类手动创建,而是可以自动从输入文本中找到的关系中得出。与创新的网络架构(例如基于转换器的架构)结合使用,自监督技术允许设计越来越有效的基于向量的单词表示(单词嵌入)。我们在本调查中展示了这种方式获得的单词表示如何成功地与常见的监督模块(例如分类网络)交互,并且极大地促进了它们的性能。

相似文献

1
Unsupervised and self-supervised deep learning approaches for biomedical text mining.无监督和自监督深度学习方法在生物医学文本挖掘中的应用。
Brief Bioinform. 2021 Mar 22;22(2):1592-1603. doi: 10.1093/bib/bbab016.
2
Filtering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification.结合监督学习和无监督学习对事件触发分类进行大规模事件集合过滤。
J Biomed Semantics. 2016 May 11;7:27. doi: 10.1186/s13326-016-0070-4. eCollection 2016.
3
Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.基于文本挖掘的词表示在生物医学数据分析和机器学习任务中的蛋白质-蛋白质相互作用网络。
PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021.
4
deepBioWSD: effective deep neural word sense disambiguation of biomedical text data.深度生物词汇语义消歧:生物医学文本数据的有效深度神经网络词汇语义消歧。
J Am Med Inform Assoc. 2019 May 1;26(5):438-446. doi: 10.1093/jamia/ocy189.
5
BertSRC: transformer-based semantic relation classification.BertSRC:基于转换器的语义关系分类。
BMC Med Inform Decis Mak. 2022 Sep 6;22(1):234. doi: 10.1186/s12911-022-01977-5.
6
The Utility of Unsupervised Machine Learning in Anatomic Pathology.无监督机器学习在解剖病理学中的应用。
Am J Clin Pathol. 2022 Jan 6;157(1):5-14. doi: 10.1093/ajcp/aqab085.
7
The Application of the Unsupervised Migration Method Based on Deep Learning Model in the Marketing Oriented Allocation of High Level Accounting Talents.基于深度学习模型的无监督迁移方法在高级会计人才营销导向配置中的应用。
Comput Intell Neurosci. 2022 Jun 6;2022:5653942. doi: 10.1155/2022/5653942. eCollection 2022.
8
Self-Taught convolutional neural networks for short text clustering.用于短文本聚类的自学卷积神经网络。
Neural Netw. 2017 Apr;88:22-31. doi: 10.1016/j.neunet.2016.12.008. Epub 2017 Jan 12.
9
An unsupervised text mining method for relation extraction from biomedical literature.一种用于从生物医学文献中提取关系的无监督文本挖掘方法。
PLoS One. 2014 Jul 18;9(7):e102039. doi: 10.1371/journal.pone.0102039. eCollection 2014.
10
A Tour of Unsupervised Deep Learning for Medical Image Analysis.医学图像分析的无监督深度学习之旅。
Curr Med Imaging. 2021;17(9):1059-1077. doi: 10.2174/1573405617666210127154257.

引用本文的文献

1
Machine learning misclassification networks reveal a citation advantage of interdisciplinary publications only in high-impact journals.机器学习误分类网络显示,跨学科出版物仅在高影响力期刊中具有被引用优势。
Sci Rep. 2024 Sep 19;14(1):21906. doi: 10.1038/s41598-024-72364-5.
2
Differentiation of granulomatous nodules with lobulation and spiculation signs from solid lung adenocarcinomas using a CT deep learning model.利用 CT 深度学习模型鉴别具有分叶和棘突征的肉芽肿性结节与实性肺腺癌。
BMC Cancer. 2024 Jul 22;24(1):875. doi: 10.1186/s12885-024-12611-0.
3
Recent Advancements in Subcellular Proteomics: Growing Impact of Organellar Protein Niches on the Understanding of Cell Biology.
亚细胞蛋白质组学的最新进展:细胞器蛋白质龛对细胞生物学理解的影响日益增大。
J Proteome Res. 2024 Aug 2;23(8):2700-2722. doi: 10.1021/acs.jproteome.3c00839. Epub 2024 Mar 7.
4
Unsupervised natural language processing in the identification of patients with suspected COVID-19 infection.基于自然语言处理的 COVID-19 疑似患者识别。
Cad Saude Publica. 2023 Dec 4;39(11):e00243722. doi: 10.1590/0102-311XPT243722. eCollection 2023.
5
Data-driven interpretable analysis for polysaccharide yield prediction.用于多糖产量预测的数据驱动可解释分析。
Environ Sci Ecotechnol. 2023 Sep 27;19:100321. doi: 10.1016/j.ese.2023.100321. eCollection 2024 May.
6
Public concerns and attitudes towards autism on Chinese social media based on K-means algorithm.基于 K-means 算法的中国社交媒体上公众对自闭症的关注和态度。
Sci Rep. 2023 Sep 13;13(1):15173. doi: 10.1038/s41598-023-42396-4.
7
Computational approaches in rheumatic diseases - Deciphering complex spatio-temporal cell interactions.风湿性疾病中的计算方法——解读复杂的时空细胞相互作用
Comput Struct Biotechnol J. 2023 Aug 6;21:4009-4020. doi: 10.1016/j.csbj.2023.08.005. eCollection 2023.
8
The Application of Deep Learning to Electroencephalograms, Magnetic Resonance Imaging, and Implants for the Detection of Epileptic Seizures: A Narrative Review.深度学习在脑电图、磁共振成像及植入物用于癫痫发作检测中的应用:一项叙述性综述
Cureus. 2023 Jul 25;15(7):e42460. doi: 10.7759/cureus.42460. eCollection 2023 Jul.
9
RDBridge: a knowledge graph of rare diseases based on large-scale text mining.RDBridge:基于大规模文本挖掘的罕见病知识图谱。
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad440.
10
Optical coherence tomography confirms non-malignant pigmented lesions in phacomatosis pigmentokeratotica using a support vector machine learning algorithm.光学相干断层扫描使用支持向量机学习算法确认色素性角化病性色素播散症中的非恶性色素性病变。
Skin Res Technol. 2023 Jun;29(6):e13377. doi: 10.1111/srt.13377.