• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

初步评估 CellFinder 文献整理管道在肾脏细胞和解剖部位基因表达中的应用。

Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts.

机构信息

Humboldt-Universität zu Berlin, Knowledge Management in Bioinformatics, Berlin, 10099, Germany.

出版信息

Database (Oxford). 2013 Apr 18;2013:bat020. doi: 10.1093/database/bat020. Print 2013.

DOI:10.1093/database/bat020
PMID:23599415
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3629873/
Abstract

Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ~50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction. Database URL: http://www.cellfinder.org/

摘要

生物医学文献整理是指从科学出版物中自动和/或手动提取知识,并将其记录到专门的数据库中,以便向用户进行结构化传递。这是一项缓慢、易错、复杂、昂贵但又非常重要的任务。以往的经验表明,文本挖掘可以辅助完成其许多阶段的工作,尤其是在相关文献的分类和命名实体及生物事件的提取方面。在此,我们展示了 CellFinder 数据库的整理流程,该数据库是一个细胞研究存储库,其中包括从文献整理和微阵列中提取的数据,用于识别细胞类型、细胞系、器官等,特别是基因表达模式。整理流程基于所有文本挖掘步骤中免费提供的工具,以及对提取数据的手动验证。我们为一个包含 2376 篇全文的数据集呈现了初步结果,从中提取了超过 4500 个细胞或解剖部位的基因表达事件。对其中一半数据的验证结果表明,我们的提取数据的准确率约为 50%,这表明我们的管道在这个任务上是正确的。然而,对这些方法的评估表明,在命名实体识别方面仍有改进的空间,并且需要更大、更稳健的语料库,才能在事件提取方面取得更好的性能。数据库网址:http://www.cellfinder.org/

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e44/3629873/5c7903159e7b/bat020f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e44/3629873/a59d81684cbe/bat020f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e44/3629873/4d26ff386815/bat020f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e44/3629873/5c7903159e7b/bat020f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e44/3629873/a59d81684cbe/bat020f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e44/3629873/4d26ff386815/bat020f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e44/3629873/5c7903159e7b/bat020f3p.jpg

相似文献

1
Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts.初步评估 CellFinder 文献整理管道在肾脏细胞和解剖部位基因表达中的应用。
Database (Oxford). 2013 Apr 18;2013:bat020. doi: 10.1093/database/bat020. Print 2013.
2
Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation.蛋白质亚细胞定位的半自动管理:一种基于文本挖掘的基因本体论(GO)细胞组分管理方法。
BMC Bioinformatics. 2009 Jul 21;10:228. doi: 10.1186/1471-2105-10-228.
3
CellFinder: a cell data repository.细胞信息库:一个细胞数据资源库。
Nucleic Acids Res. 2014 Jan;42(Database issue):D950-8. doi: 10.1093/nar/gkt1264. Epub 2013 Dec 3.
4
Overview of the gene ontology task at BioCreative IV.生物创意IV基因本体任务概述。
Database (Oxford). 2014 Aug 25;2014. doi: 10.1093/database/bau086. Print 2014.
5
Text mining effectively scores and ranks the literature for improving chemical-gene-disease curation at the comparative toxicogenomics database.文本挖掘有效地对文献进行评分和排序,以提高比较毒理学基因组学数据库中的化学物质-基因-疾病的编纂工作。
PLoS One. 2013 Apr 17;8(4):e58201. doi: 10.1371/journal.pone.0058201. Print 2013.
6
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.生物信息学工作流程和文本挖掘:BioCreative 2012 研讨会第二轨道概述。
Database (Oxford). 2012 Nov 17;2012:bas043. doi: 10.1093/database/bas043. Print 2012.
7
An overview of the BioCreative 2012 Workshop Track III: interactive text mining task.BioCreative 2012 研讨会第三轨道:交互式文本挖掘任务概述。
Database (Oxford). 2013 Jan 17;2013:bas056. doi: 10.1093/database/bas056. Print 2013.
8
An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.对生物创意(BioCreAtIvE)和基因本体注释(GOA)的基因本体(GO)注释检索的评估。
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24.
9
Cataloging the biomedical world of pain through semi-automated curation of molecular interactions.通过分子相互作用的半自动策展来编目疼痛的生物医学世界。
Database (Oxford). 2013 May 23;2013:bat033. doi: 10.1093/database/bat033. Print 2013.
10
Text mining in livestock animal science: introducing the potential of text mining to animal sciences.文本挖掘在畜牧动物科学中的应用:介绍文本挖掘在动物科学中的应用潜力。
J Anim Sci. 2012 Oct;90(10):3666-76. doi: 10.2527/jas.2011-4841. Epub 2012 Jun 4.

引用本文的文献

1
An extensive review of tools for manual annotation of documents.对文档手动标注工具的全面回顾。
Brief Bioinform. 2021 Jan 18;22(1):146-163. doi: 10.1093/bib/bbz130.
2
A context-based ABC model for literature-based discovery.基于上下文的文献发现 ABC 模型。
PLoS One. 2019 Apr 24;14(4):e0215313. doi: 10.1371/journal.pone.0215313. eCollection 2019.
3
Usage of cell nomenclature in biomedical literature.生物医学文献中细胞命名法的使用。

本文引用的文献

1
A survey on annotation tools for the biomedical literature.一份关于生物医学文献注释工具的调查。
Brief Bioinform. 2014 Mar;15(2):327-40. doi: 10.1093/bib/bbs084. Epub 2012 Dec 18.
2
Collaborative biocuration--text-mining development task for document prioritization for curation.协作生物注释——用于文档优先级排序的文本挖掘开发任务,以便进行注释。
Database (Oxford). 2012 Nov 22;2012:bas037. doi: 10.1093/database/bas037. Print 2012.
3
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.生物信息学工作流程和文本挖掘:BioCreative 2012 研讨会第二轨道概述。
BMC Bioinformatics. 2017 Dec 21;18(Suppl 17):561. doi: 10.1186/s12859-017-1978-0.
4
Differential gene expression in disease: a comparison between high-throughput studies and the literature.疾病中的差异基因表达:高通量研究与文献的比较
BMC Med Genomics. 2017 Oct 11;10(1):59. doi: 10.1186/s12920-017-0293-y.
5
An integrated text mining framework for metabolic interaction network reconstruction.用于代谢相互作用网络重建的集成文本挖掘框架。
PeerJ. 2016 Mar 21;4:e1811. doi: 10.7717/peerj.1811. eCollection 2016.
6
TEES 2.2: Biomedical Event Extraction for Diverse Corpora.TEES 2.2:针对不同语料库的生物医学事件提取
BMC Bioinformatics. 2015;16 Suppl 16(Suppl 16):S4. doi: 10.1186/1471-2105-16-S16-S4. Epub 2015 Oct 30.
7
Cell line name recognition in support of the identification of synthetic lethality in cancer from text.支持从文本中识别癌症合成致死性的细胞系名称识别
Bioinformatics. 2016 Jan 15;32(2):276-82. doi: 10.1093/bioinformatics/btv570. Epub 2015 Oct 1.
8
An analysis on the entity annotations in biological corpora.生物语料库中实体注释的分析。
F1000Res. 2014 Apr 25;3:96. doi: 10.12688/f1000research.3216.1. eCollection 2014.
9
Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.应用 MetaMap 对 Medline 进行分析,以在大型临床数据集识别新的关联:可行性分析。
J Am Med Inform Assoc. 2014 Sep-Oct;21(5):925-37. doi: 10.1136/amiajnl-2014-002767. Epub 2014 Jun 13.
10
Alkemio: association of chemicals with biomedical topics by text and data mining.Alkemio:通过文本和数据挖掘将化学物质与生物医学主题相关联。
Nucleic Acids Res. 2014 Jul;42(Web Server issue):W422-9. doi: 10.1093/nar/gku432. Epub 2014 May 16.
Database (Oxford). 2012 Nov 17;2012:bas043. doi: 10.1093/database/bas043. Print 2012.
4
Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.利用语义信息为比较毒理学基因组数据库对 PubMed 文章进行优先级排序。
Database (Oxford). 2012 Nov 17;2012:bas042. doi: 10.1093/database/bas042. Print 2012.
5
Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR.生物注释工作流程中的文本挖掘:在 WormBase、dictyBase 和 TAIR 中进行文献注释的应用。
Database (Oxford). 2012 Nov 17;2012:bas040. doi: 10.1093/database/bas040. Print 2012.
6
Opportunities for text mining in the FlyBase genetic literature curation workflow.在 FlyBase 遗传文献管理工作流程中进行文本挖掘的机会。
Database (Oxford). 2012 Nov 17;2012:bas039. doi: 10.1093/database/bas039. Print 2012.
7
Developing a biocuration workflow for AgBase, a non-model organism database.开发 AgBase(一种非模式生物数据库)的生物注释工作流程。
Database (Oxford). 2012 Nov 17;2012:bas038. doi: 10.1093/database/bas038. Print 2012.
8
A new ontology (structured hierarchy) of human developmental anatomy for the first 7 weeks (Carnegie stages 1-20).人类发育解剖学的新本体论(结构化层次结构),涵盖前 7 周(卡内基阶段 1-20)。
J Anat. 2012 Nov;221(5):406-16. doi: 10.1111/j.1469-7580.2012.01566.x. Epub 2012 Sep 14.
9
A robust approach to extract biomedical events from literature.从文献中提取生物医学事件的稳健方法。
Bioinformatics. 2012 Oct 15;28(20):2654-61. doi: 10.1093/bioinformatics/bts487. Epub 2012 Aug 1.
10
MyMiner: a web application for computer-assisted biocuration and text annotation.MyMiner:一个用于计算机辅助生物注释和文本注释的网络应用程序。
Bioinformatics. 2012 Sep 1;28(17):2285-7. doi: 10.1093/bioinformatics/bts435. Epub 2012 Jul 12.