• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
NLM-Chem-BC7: manually annotated full-text resources for chemical entity annotation and indexing in biomedical articles.NLM-Chem-BC7:用于生物医学文章中化学实体注释和索引的人工标注全文资源。
Database (Oxford). 2022 Dec 1;2022. doi: 10.1093/database/baac102.
2
Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII.全文文章中的化学物质鉴定与标引:NLM-Chem 在 BioCreative VII 挑战赛中的概述
Database (Oxford). 2023 Mar 7;2023. doi: 10.1093/database/baad005.
3
NLM-Gene, a richly annotated gold standard dataset for gene entities that addresses ambiguity and multi-species gene recognition.NLM-Gene,一个丰富注释的基因实体黄金标准数据集,解决了模糊性和多物种基因识别问题。
J Biomed Inform. 2021 Jun;118:103779. doi: 10.1016/j.jbi.2021.103779. Epub 2021 Apr 9.
4
NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature.NLM-Chem,一个用于 PubMed 全文文献中化学实体识别的新资源。
Sci Data. 2021 Mar 25;8(1):91. doi: 10.1038/s41597-021-00875-1.
5
Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics.使用深度学习和启发式方法在 PubMed 全文文章中进行化学物质的识别和标引。
Database (Oxford). 2022 Jul 1;2022. doi: 10.1093/database/baac047.
6
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库:化学疾病关系提取的资源。
Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.
7
The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop.生物创意 VIII 挑战赛和研讨会的 BioRED 专题生物医学关系语料库。
Database (Oxford). 2024 Aug 9;2024. doi: 10.1093/database/baae071.
8
Full-text chemical identification with improved generalizability and tagging consistency.全文化学物质识别,具有更好的泛化能力和标签一致性。
Database (Oxford). 2022 Sep 28;2022. doi: 10.1093/database/baac074.
9
NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库:一种用于疾病名称识别和概念规范化的资源。
J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.
10
MeSH indexing based on automatically generated summaries.基于自动生成的摘要进行 MeSH 标引。
BMC Bioinformatics. 2013 Jun 26;14:208. doi: 10.1186/1471-2105-14-208.

引用本文的文献

1
Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature.来自人机交互方法的数据集,用于从文献中识别功能重要的蛋白质残基。
Sci Data. 2024 Sep 27;11(1):1032. doi: 10.1038/s41597-024-03841-9.
2
The biomedical relationship corpus of the BioRED track at the BioCreative VIII challenge and workshop.生物创意 VIII 挑战赛和研讨会的 BioRED 专题生物医学关系语料库。
Database (Oxford). 2024 Aug 9;2024. doi: 10.1093/database/baae071.
3
Advancing entity recognition in biomedicine via instruction tuning of large language models.通过指令调整大型语言模型推进生物医学中的实体识别。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae163.
4
BELB: a biomedical entity linking benchmark.BELB:一个生物医学实体链接基准。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad698.
5
Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII.全文文章中的化学物质鉴定与标引:NLM-Chem 在 BioCreative VII 挑战赛中的概述
Database (Oxford). 2023 Mar 7;2023. doi: 10.1093/database/baad005.

本文引用的文献

1
Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII.全文文章中的化学物质鉴定与标引:NLM-Chem 在 BioCreative VII 挑战赛中的概述
Database (Oxford). 2023 Mar 7;2023. doi: 10.1093/database/baad005.
2
NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature.NLM-Chem,一个用于 PubMed 全文文献中化学实体识别的新资源。
Sci Data. 2021 Mar 25;8(1):91. doi: 10.1038/s41597-021-00875-1.
3
TeamTat: a collaborative text annotation tool.TeamTat:一个协作文本注释工具。
Nucleic Acids Res. 2020 Jul 2;48(W1):W5-W11. doi: 10.1093/nar/gkaa333.
4
PubTator central: automated concept annotation for biomedical full text articles.PubTator 中心:用于生物医学全文文章的自动概念标注。
Nucleic Acids Res. 2019 Jul 2;47(W1):W587-W593. doi: 10.1093/nar/gkz389.
5
PMC text mining subset in BioC: about three million full-text articles and growing.PMC 文本挖掘子集在 BioC 中:约三百万篇全文文章且还在不断增加。
Bioinformatics. 2019 Sep 15;35(18):3533-3535. doi: 10.1093/bioinformatics/btz070.
6
Biomedical text mining for research rigor and integrity: tasks, challenges, directions.生物医学文本挖掘的研究严谨性和完整性:任务、挑战和方向。
Brief Bioinform. 2018 Nov 27;19(6):1400-1414. doi: 10.1093/bib/bbx057.
7
Information Retrieval and Text Mining Technologies for Chemistry.化学信息检索与文本挖掘技术。
Chem Rev. 2017 Jun 28;117(12):7673-7761. doi: 10.1021/acs.chemrev.6b00851. Epub 2017 May 5.
8
The BioC-BioGRID corpus: full text articles annotated for curation of protein-protein and genetic interactions.BioC-BioGRID语料库:为蛋白质-蛋白质和基因相互作用的编目而注释的全文文章。
Database (Oxford). 2017 Jan 10;2017. doi: 10.1093/database/baw147. Print 2017.
9
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库:化学疾病关系提取的资源。
Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.
10
An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition.BIOASQ大规模生物医学语义索引与问答竞赛概述。
BMC Bioinformatics. 2015 Apr 30;16:138. doi: 10.1186/s12859-015-0564-6.

NLM-Chem-BC7:用于生物医学文章中化学实体注释和索引的人工标注全文资源。

NLM-Chem-BC7: manually annotated full-text resources for chemical entity annotation and indexing in biomedical articles.

机构信息

National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.

出版信息

Database (Oxford). 2022 Dec 1;2022. doi: 10.1093/database/baac102.

DOI:10.1093/database/baac102
PMID:36458799
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9716560/
Abstract

The automatic recognition of chemical names and their corresponding database identifiers in biomedical text is an important first step for many downstream text-mining applications. The task is even more challenging when considering the identification of these entities in the article's full text and, furthermore, the identification of candidate substances for that article's metadata [Medical Subject Heading (MeSH) article indexing]. The National Library of Medicine (NLM)-Chem track at BioCreative VII aimed to foster the development of algorithms that can predict with high quality the chemical entities in the biomedical literature and further identify the chemical substances that are candidates for article indexing. As a result of this challenge, the NLM-Chem track produced two comprehensive, manually curated corpora annotated with chemical entities and indexed with chemical substances: the chemical identification corpus and the chemical indexing corpus. The NLM-Chem BioCreative VII (NLM-Chem-BC7) Chemical Identification corpus consists of 204 full-text PubMed Central (PMC) articles, fully annotated for chemical entities by 12 NLM indexers for both span (i.e. named entity recognition) and normalization (i.e. entity linking) using MeSH. This resource was used for the training and testing of the Chemical Identification task to evaluate the accuracy of algorithms in predicting chemicals mentioned in recently published full-text articles. The NLM-Chem-BC7 Chemical Indexing corpus consists of 1333 recently published PMC articles, equipped with chemical substance indexing by manual experts at the NLM. This resource was used for the evaluation of the Chemical Indexing task, which evaluated the accuracy of algorithms in predicting the chemicals that should be indexed, i.e. appear in the listing of MeSH terms for the document. This set was further enriched after the challenge in two ways: (i) 11 NLM indexers manually verified each of the candidate terms appearing in the prediction results of the challenge participants, but not in the MeSH indexing, and the chemical indexing terms appearing in the MeSH indexing list, but not in the prediction results, and (ii) the challenge organizers algorithmically merged the chemical entity annotations in the full text for all predicted chemical entities and used a statistical approach to keep those with the highest degree of confidence. As a result, the NLM-Chem-BC7 Chemical Indexing corpus is a gold-standard corpus for chemical indexing of journal articles and a silver-standard corpus for chemical entity identification in full-text journal articles. Together, these resources are currently the most comprehensive resources for chemical entity recognition, and we demonstrate improvements in the chemical entity recognition algorithms. We detail the characteristics of these novel resources and make them available for the community. Database URL: https://ftp.ncbi.nlm.nih.gov/pub/lu/NLM-Chem-BC7-corpus/.

摘要

生物医学文本中化学名称及其相应数据库标识符的自动识别是许多下游文本挖掘应用的重要第一步。当考虑在文章全文中识别这些实体,并且进一步识别该文章元数据[医学主题词(MeSH)文章索引]的候选物质时,任务变得更加具有挑战性。第七届生物创意挑战赛(BioCreative VII)的国家医学图书馆(NLM)-Chem 轨道旨在促进开发能够高质量预测生物医学文献中化学实体的算法,并进一步识别可用于文章索引的化学物质候选物。作为这项挑战的结果,NLM-Chem 轨道生成了两个综合的、手动注释的化学实体和用化学物质索引的数据集:化学识别语料库和化学索引语料库。NLM-Chem BioCreative VII(NLM-Chem-BC7)化学识别语料库由 204 篇全文 PubMed Central(PMC)文章组成,12 名 NLM 索引员使用 MeSH 对化学实体进行了全面注释,包括跨度(即命名实体识别)和标准化(即实体链接)。该资源用于化学识别任务的培训和测试,以评估算法在预测最近发表的全文文章中提到的化学物质方面的准确性。NLM-Chem-BC7 化学索引语料库由 1333 篇最近发表的 PMC 文章组成,由 NLM 的手动专家配备化学物质索引。该资源用于评估化学索引任务的准确性,该任务评估了算法在预测应索引的化学物质方面的准确性,即出现在文档 MeSH 术语列表中的化学物质。在挑战之后,该数据集以两种方式进一步丰富:(i)11 名 NLM 索引员手动验证了挑战参与者预测结果中出现的每个候选术语,但未出现在 MeSH 索引中,以及 MeSH 索引列表中出现的化学索引术语,但未出现在预测结果中,(ii)挑战组织者算法合并了所有预测化学实体的全文中的化学实体注释,并使用统计方法保留了置信度最高的实体。结果,NLM-Chem-BC7 化学索引语料库是期刊文章化学索引的黄金标准语料库,也是全文期刊文章中化学实体识别的白银标准语料库。这些资源共同构成了目前最全面的化学实体识别资源,并且我们展示了化学实体识别算法的改进。我们详细介绍了这些新资源的特点,并将其提供给社区。数据库 URL:https://ftp.ncbi.nlm.nih.gov/pub/lu/NLM-Chem-BC7-corpus/。