• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

俄罗斯药物反应语料库和用于在用户评论中检测药物反应和疗效的神经模型。

The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews.

机构信息

Chemoinformatics and Molecular Modeling Laboratory, The Alexander Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russian Federation.

Samsung-PDMI AI Center, Steklov Institute of Mathematics at St. Petersburg, St. Petersburg 191023, Russian Federation.

出版信息

Bioinformatics. 2021 Apr 19;37(2):243-249. doi: 10.1093/bioinformatics/btaa675.

DOI:10.1093/bioinformatics/btaa675
PMID:32722774
Abstract

MOTIVATION

Drugs and diseases play a central role in many areas of biomedical research and healthcare. Aggregating knowledge about these entities across a broader range of domains and languages is critical for information extraction (IE) applications. To facilitate text mining methods for analysis and comparison of patient's health conditions and adverse drug reactions reported on the Internet with traditional sources such as drug labels, we present a new corpus of Russian language health reviews.

RESULTS

The Russian Drug Reaction Corpus (RuDReC) is a new partially annotated corpus of consumer reviews in Russian about pharmaceutical products for the detection of health-related named entities and the effectiveness of pharmaceutical products. The corpus itself consists of two parts, the raw one and the labeled one. The raw part includes 1.4 million health-related user-generated texts collected from various Internet sources, including social media. The labeled part contains 500 consumer reviews about drug therapy with drug- and disease-related information. Labels for sentences include health-related issues or their absence. The sentences with one are additionally labeled at the expression level for identification of fine-grained subtypes such as drug classes and drug forms, drug indications and drug reactions. Further, we present a baseline model for named entity recognition (NER) and multilabel sentence classification tasks on this corpus. The macro F1 score of 74.85% in the NER task was achieved by our RuDR-BERT model. For the sentence classification task, our model achieves the macro F1 score of 68.82% gaining 7.47% over the score of BERT model trained on Russian data.

AVAILABILITY AND IMPLEMENTATION

We make the RuDReC corpus and pretrained weights of domain-specific BERT models freely available at https://github.com/cimm-kzn/RuDReC.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

药物和疾病在许多生物医学研究和医疗保健领域中起着核心作用。在更广泛的领域和语言中汇总有关这些实体的知识对于信息提取(IE)应用程序至关重要。为了促进文本挖掘方法,以便在互联网上与传统来源(如药物标签)分析和比较患者的健康状况和药物不良反应,我们提出了一个新的俄语健康评论语料库。

结果

俄语药物反应语料库(RuDReC)是一个新的部分注释的俄语消费者评论语料库,用于检测与健康相关的命名实体和药物产品的有效性。语料库本身由两部分组成,原始部分和标记部分。原始部分包括从各种互联网来源(包括社交媒体)收集的 140 万与健康相关的用户生成文本。标记部分包含 500 条关于药物治疗的消费者评论,其中包含药物和疾病相关信息。句子的标签包括健康相关问题或不存在。对于包含一个的句子,还会在表达级别上进行标记,以识别细粒度的亚类型,例如药物类别和药物形式、药物适应症和药物反应。此外,我们在该语料库上展示了命名实体识别(NER)和多标签句子分类任务的基线模型。我们的 RuDR-BERT 模型在 NER 任务中实现了 74.85%的宏 F1 得分。对于句子分类任务,我们的模型获得了 68.82%的宏 F1 得分,比在俄语数据上训练的 BERT 模型的得分高出 7.47%。

可用性和实现

我们在 https://github.com/cimm-kzn/RuDReC 上免费提供 RuDReC 语料库和特定领域 BERT 模型的预训练权重。

补充信息

补充数据可在生物信息学在线获得。

相似文献

1
The Russian Drug Reaction Corpus and neural models for drug reactions and effectiveness detection in user reviews.俄罗斯药物反应语料库和用于在用户评论中检测药物反应和疗效的神经模型。
Bioinformatics. 2021 Apr 19;37(2):243-249. doi: 10.1093/bioinformatics/btaa675.
2
An annotated corpus from biomedical articles to construct a drug-food interaction database.一个来自生物医学文章的带注释语料库,用于构建药物-食物相互作用数据库。
J Biomed Inform. 2022 Feb;126:103985. doi: 10.1016/j.jbi.2022.103985. Epub 2022 Jan 7.
3
Drug knowledge discovery via multi-task learning and pre-trained models.通过多任务学习和预训练模型进行药物知识发现。
BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):251. doi: 10.1186/s12911-021-01614-7.
4
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
5
A span-based joint model for extracting entities and relations of bacteria biotopes.基于跨度的细菌生境实体和关系抽取联合模型。
Bioinformatics. 2021 Dec 22;38(1):220-227. doi: 10.1093/bioinformatics/btab593.
6
Extracting comprehensive clinical information for breast cancer using deep learning methods.利用深度学习方法提取乳腺癌全面临床信息。
Int J Med Inform. 2019 Dec;132:103985. doi: 10.1016/j.ijmedinf.2019.103985. Epub 2019 Oct 2.
7
Automatic Extraction of Comprehensive Drug Safety Information from Adverse Drug Event Narratives in the Korea Adverse Event Reporting System Using Natural Language Processing Techniques.利用自然语言处理技术从韩国不良事件报告系统的不良药物事件叙述中自动提取全面的药物安全信息。
Drug Saf. 2023 Aug;46(8):781-795. doi: 10.1007/s40264-023-01323-2. Epub 2023 Jun 17.
8
Multimodal model with text and drug embeddings for adverse drug reaction classification.基于文本和药物嵌入的多模态模型用于药物不良反应分类。
J Biomed Inform. 2022 Nov;135:104182. doi: 10.1016/j.jbi.2022.104182. Epub 2022 Sep 30.
9
Supervised Relation Extraction Between Suicide-Related Entities and Drugs: Development and Usability Study of an Annotated PubMed Corpus.基于标注 PubMed 语料库的自杀相关实体与药物间监督关系抽取:开发与可用性研究
J Med Internet Res. 2023 Mar 8;25:e41100. doi: 10.2196/41100.
10
A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications.构建患者报告的药物不良事件语料库的系统方法:以SSRI和SNRI药物为例的案例研究。
J Biomed Inform. 2019 Feb;90:103091. doi: 10.1016/j.jbi.2018.12.005. Epub 2019 Jan 4.

引用本文的文献

1
Bidirectional Long Short-Term Memory-Based Detection of Adverse Drug Reaction Posts Using Korean Social Networking Services Data: Deep Learning Approaches.基于双向长短时记忆的利用韩国社交网络服务数据检测药物不良反应帖子:深度学习方法。
JMIR Med Inform. 2024 Nov 20;12:e45289. doi: 10.2196/45289.
2
Comparison of an Ensemble of Machine Learning Models and the BERT Language Model for Analysis of Text Descriptions of Brain CT Reports to Determine the Presence of Intracranial Hemorrhage.基于机器学习模型集成与 BERT 语言模型的脑 CT 报告文本描述分析用于判断颅内出血的比较研究
Sovrem Tekhnologii Med. 2024;16(1):27-34. doi: 10.17691/stm2024.16.1.03. Epub 2024 Feb 28.
3
Predicting Drugs Suspected of Causing Adverse Drug Reactions Using Graph Features and Attention Mechanisms.
利用图形特征和注意力机制预测疑似引起药物不良反应的药物
Pharmaceuticals (Basel). 2024 Jun 22;17(7):822. doi: 10.3390/ph17070822.
4
Transformers and large language models in healthcare: A review.医疗保健中的变压器和大型语言模型:综述。
Artif Intell Med. 2024 Aug;154:102900. doi: 10.1016/j.artmed.2024.102900. Epub 2024 Jun 5.
5
Language model and its interpretability in biomedicine: A scoping review.语言模型及其在生物医学中的可解释性:一项范围综述。
iScience. 2024 Feb 24;27(4):109334. doi: 10.1016/j.isci.2024.109334. eCollection 2024 Apr 19.
6
NEREL-BIO: a dataset of biomedical abstracts annotated with nested named entities.NEREL-BIO:一个标注有嵌套命名实体的生物医学摘要数据集。
Bioinformatics. 2023 Apr 3;39(4). doi: 10.1093/bioinformatics/btad161.
7
iADRGSE: A Graph-Embedding and Self-Attention Encoding for Identifying Adverse Drug Reaction in the Earlier Phase of Drug Development.iADRGSE:一种用于在药物研发早期识别药物不良反应的图嵌入和自注意力编码方法。
Int J Mol Sci. 2022 Dec 19;23(24):16216. doi: 10.3390/ijms232416216.
8
Identification of hand-foot syndrome from cancer patients' blog posts: BERT-based deep-learning approach to detect potential adverse drug reaction symptoms.基于 BERT 的深度学习方法从癌症患者的博客文章中识别手足综合征:检测潜在药物不良反应症状。
PLoS One. 2022 May 4;17(5):e0267901. doi: 10.1371/journal.pone.0267901. eCollection 2022.
9
MedTAG: a portable and customizable annotation tool for biomedical documents.MedTAG:一个用于生物医学文档的可移植和可定制的注释工具。
BMC Med Inform Decis Mak. 2021 Dec 18;21(1):352. doi: 10.1186/s12911-021-01706-4.