• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CRFVoter:使用基于条件随机场工具集合的基因和蛋白质相关对象识别

CRFVoter: gene and protein related object recognition using a conglomerate of CRF-based tools.

作者信息

Hemati Wahed, Mehler Alexander

机构信息

Text Technology Lab, Goethe-University Frankfurt, Robert-Mayer-Straße 10, 60325, Frankfurt am Main, Germany.

出版信息

J Cheminform. 2019 Mar 14;11(1):21. doi: 10.1186/s13321-019-0343-x.

DOI:10.1186/s13321-019-0343-x
PMID:30874918
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6419804/
Abstract

BACKGROUND

Gene and protein related objects are an important class of entities in biomedical research, whose identification and extraction from scientific articles is attracting increasing interest. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of gene and protein related objects. For this purpose, we transform the task as posed by BioCreative V.5 into a sequence labeling problem. We present a series of sequence labeling systems that we used and adapted in our experiments for solving this task. Our experiments show how to optimize the hyperparameters of the classifiers involved. To this end, we utilize various algorithms for hyperparameter optimization. Finally, we present CRFVoter, a two-stage application of Conditional Random Field (CRF) that integrates the optimized sequence labelers from our study into one ensemble classifier.

RESULTS

We analyze the impact of hyperparameter optimization regarding named entity recognition in biomedical research and show that this optimization results in a performance increase of up to 60%. In our evaluation, our ensemble classifier based on multiple sequence labelers, called CRFVoter, outperforms each individual extractor's performance. For the blinded test set provided by the BioCreative organizers, CRFVoter achieves an F-score of 75%, a recall of 71% and a precision of 80%. For the GPRO type 1 evaluation, CRFVoter achieves an F-Score of 73%, a recall of 70% and achieved the best precision (77%) among all task participants.

CONCLUSION

CRFVoter is effective when multiple sequence labeling systems are to be used and performs better then the individual systems collected by it.

摘要

背景

基因和蛋白质相关对象是生物医学研究中一类重要的实体,从科学文献中识别和提取这些对象正引起越来越多的关注。在这项工作中,我们描述了一种针对生物创意V.5挑战赛中基因和蛋白质相关对象的识别与分类的方法。为此,我们将生物创意V.5提出的任务转化为一个序列标注问题。我们展示了一系列在实验中使用和调整的用于解决此任务的序列标注系统。我们的实验展示了如何优化相关分类器的超参数。为此,我们利用各种算法进行超参数优化。最后,我们提出了CRFVoter,这是一种条件随机场(CRF)的两阶段应用,它将我们研究中优化后的序列标注器集成到一个集成分类器中。

结果

我们分析了超参数优化对生物医学研究中命名实体识别的影响,并表明这种优化可使性能提高多达60%。在我们的评估中,我们基于多个序列标注器的集成分类器CRFVoter的性能优于每个单独提取器。对于生物创意组织者提供的盲测集,CRFVoter的F值为75%,召回率为71%,精确率为80%。对于GPRO 1型评估,CRFVoter的F值为73%,召回率为70%,并在所有任务参与者中取得了最佳精确率(77%)。

结论

当使用多个序列标注系统时,CRFVoter是有效的,并且其性能优于它所收集的单个系统。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf65/6419804/093b2d84b747/13321_2019_343_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf65/6419804/e2a78a3a3dd4/13321_2019_343_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf65/6419804/322c01e3e84a/13321_2019_343_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf65/6419804/d7e9239a5a0b/13321_2019_343_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf65/6419804/093b2d84b747/13321_2019_343_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf65/6419804/e2a78a3a3dd4/13321_2019_343_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf65/6419804/322c01e3e84a/13321_2019_343_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf65/6419804/d7e9239a5a0b/13321_2019_343_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf65/6419804/093b2d84b747/13321_2019_343_Fig4_HTML.jpg

相似文献

1
CRFVoter: gene and protein related object recognition using a conglomerate of CRF-based tools.CRFVoter:使用基于条件随机场工具集合的基因和蛋白质相关对象识别
J Cheminform. 2019 Mar 14;11(1):21. doi: 10.1186/s13321-019-0343-x.
2
LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.LSTMVoter:使用序列标注工具集合进行化学命名实体识别。
J Cheminform. 2019 Jan 10;11(1):3. doi: 10.1186/s13321-018-0327-2.
3
Statistical principle-based approach for gene and protein related object recognition.基于统计原理的基因和蛋白质相关对象识别方法。
J Cheminform. 2018 Dec 17;10(1):64. doi: 10.1186/s13321-018-0314-7.
4
Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning.基于领域知识和无监督特征学习的专利中化学命名实体识别
Database (Oxford). 2016 Apr 17;2016. doi: 10.1093/database/baw049. Print 2016.
5
BioCreative VI Precision Medicine Track system performance is constrained by entity recognition and variations in corpus characteristics.生物创意 VI 精准医疗轨道系统的性能受到实体识别和语料库特征变化的限制。
Database (Oxford). 2018 Jan 1;2018:bay122. doi: 10.1093/database/bay122.
6
A neural network approach to chemical and gene/protein entity recognition in patents.一种用于专利中化学及基因/蛋白质实体识别的神经网络方法。
J Cheminform. 2018 Dec 18;10(1):65. doi: 10.1186/s13321-018-0318-3.
7
Chemical entity extraction using CRF and an ensemble of extractors.基于条件随机场和集成抽取器的化学实体抽取。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S12. doi: 10.1186/1758-2946-7-S1-S12. eCollection 2015.
8
A linear classifier based on entity recognition tools and a statistical approach to method extraction in the protein-protein interaction literature.基于实体识别工具和统计方法的线性分类器,用于提取蛋白质相互作用文献中的方法。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S12. doi: 10.1186/1471-2105-12-S8-S12.
9
Stacked ensemble combined with fuzzy matching for biomedical named entity recognition of diseases.堆叠集成与模糊匹配相结合用于疾病的生物医学命名实体识别
J Biomed Inform. 2016 Dec;64:1-9. doi: 10.1016/j.jbi.2016.09.009. Epub 2016 Sep 12.
10
A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature.条件随机场与结构化支持向量机在生物医学文献中化学实体识别的比较。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S8. doi: 10.1186/1758-2946-7-S1-S8. eCollection 2015.

引用本文的文献

1
CT radiomics to predict pathologic complete response after neoadjuvant immunotherapy plus chemoradiotherapy in locally advanced esophageal squamous cell carcinoma.CT影像组学预测局部晚期食管鳞状细胞癌新辅助免疫治疗联合放化疗后的病理完全缓解
Eur Radiol. 2025 Mar;35(3):1594-1604. doi: 10.1007/s00330-024-11141-4. Epub 2024 Oct 29.
2
Species Classification for Neuroscience Literature Based on Span of Interest Using Sequence-to-Sequence Learning Model.基于使用序列到序列学习模型的关注跨度的神经科学文献物种分类
Front Hum Neurosci. 2020 Apr 21;14:128. doi: 10.3389/fnhum.2020.00128. eCollection 2020.
3
LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.

本文引用的文献

1
Medical Image Analysis using Convolutional Neural Networks: A Review.基于卷积神经网络的医学图像分析:综述
J Med Syst. 2018 Oct 8;42(11):226. doi: 10.1007/s10916-018-1088-1.
2
Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis.深度电子健康记录(EHR):深度学习技术在电子健康记录(EHR)分析中的最新进展综述。
IEEE J Biomed Health Inform. 2018 Sep;22(5):1589-1604. doi: 10.1109/JBHI.2017.2767063. Epub 2017 Oct 27.
3
An Interactional Profile to Assist the Differential Diagnosis of Neurodegenerative and Functional Memory Disorders.
LSTMVoter:使用序列标注工具集合进行化学命名实体识别。
J Cheminform. 2019 Jan 10;11(1):3. doi: 10.1186/s13321-018-0327-2.
用于辅助神经退行性和功能性记忆障碍鉴别诊断的互动特征分析。
Alzheimer Dis Assoc Disord. 2018 Jul-Sep;32(3):197-206. doi: 10.1097/WAD.0000000000000231.
4
From machine learning to deep learning: progress in machine intelligence for rational drug discovery.从机器学习到深度学习:用于理性药物发现的机器智能的进展。
Drug Discov Today. 2017 Nov;22(11):1680-1685. doi: 10.1016/j.drudis.2017.08.010. Epub 2017 Sep 4.
5
Deep Learning in Drug Discovery.药物研发中的深度学习
Mol Inform. 2016 Jan;35(1):3-14. doi: 10.1002/minf.201501008. Epub 2015 Dec 30.
6
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库:化学疾病关系提取的资源。
Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.
7
Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.评估生物医学关系抽取的技术现状:生物创意V化学-疾病关系(CDR)任务概述。
Database (Oxford). 2016 Mar 19;2016. doi: 10.1093/database/baw032. Print 2016.
8
Gene regulatory networks and their applications: understanding biological and medical problems in terms of networks.基因调控网络及其应用:从网络的角度理解生物和医学问题。
Front Cell Dev Biol. 2014 Aug 19;2:38. doi: 10.3389/fcell.2014.00038. eCollection 2014.
9
Deep learning for neuroimaging: a validation study.深度学习在神经影像学中的应用:一项验证性研究。
Front Neurosci. 2014 Aug 20;8:229. doi: 10.3389/fnins.2014.00229. eCollection 2014.
10
Deep learning-based feature representation for AD/MCI classification.基于深度学习的用于阿尔茨海默病/轻度认知障碍分类的特征表示
Med Image Comput Comput Assist Interv. 2013;16(Pt 2):583-90. doi: 10.1007/978-3-642-40763-5_72.