在癌症风险评估背景下对文本信息结构模型的比较和基于用户的评估。

A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment.

机构信息

Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge CB30FD, UK.

出版信息

BMC Bioinformatics. 2011 Mar 8;12:69. doi: 10.1186/1471-2105-12-69.

DOI:10.1186/1471-2105-12-69

PMID:21385430

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3060841/

Abstract

BACKGROUND

Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the results or conclusions of the study in question. Several schemes have been developed to characterize such information in scientific journal articles. For example, a simple section-based scheme assigns individual sentences in abstracts under sections such as Objective, Methods, Results and Conclusions. Some schemes of textual information structure have proved useful for biomedical text mining (BIO-TM) tasks (e.g. automatic summarization). However, user-centered evaluation in the context of real-life tasks has been lacking.

METHODS

We take three schemes of different type and granularity--those based on section names, Argumentative Zones (AZ) and Core Scientific Concepts (CoreSC)--and evaluate their usefulness for a real-life task which focuses on biomedical abstracts: Cancer Risk Assessment (CRA). We annotate a corpus of CRA abstracts according to each scheme, develop classifiers for automatic identification of the schemes in abstracts, and evaluate both the manual and automatic classifications directly as well as in the context of CRA.

RESULTS

Our results show that for each scheme, the majority of categories appear in abstracts, although two of the schemes (AZ and CoreSC) were developed originally for full journal articles. All the schemes can be identified in abstracts relatively reliably using machine learning. Moreover, when cancer risk assessors are presented with scheme annotated abstracts, they find relevant information significantly faster than when presented with unannotated abstracts, even when the annotations are produced using an automatic classifier. Interestingly, in this user-based evaluation the coarse-grained scheme based on section names proved nearly as useful for CRA as the finest-grained CoreSC scheme.

CONCLUSIONS

We have shown that existing schemes aimed at capturing information structure of scientific documents can be applied to biomedical abstracts and can be identified in them automatically with an accuracy which is high enough to benefit a real-life task in biomedicine.

摘要

背景

许多生物医学领域的实际任务都需要在科学文献中获取特定类型的信息；例如，与所研究问题的结果或结论相关的信息。已经开发了几种方案来描述科学期刊文章中的此类信息。例如，一种简单的基于节的方案将摘要中属于目标、方法、结果和结论等节的句子单独分配。一些文本信息结构方案已被证明对生物医学文本挖掘（BIO-TM）任务（例如自动摘要）有用。然而，在现实任务的背景下，缺乏以用户为中心的评估。

方法

我们采用了三种不同类型和粒度的方案 - 基于节名、论证区（AZ）和核心科学概念（CoreSC）的方案 - 并评估它们在一个专注于生物医学摘要的现实任务中的有用性：癌症风险评估（CRA）。我们根据每个方案对 CRA 摘要进行注释，为自动识别摘要中的方案开发分类器，并直接以及在 CRA 的背景下评估手动和自动分类。

结果

我们的结果表明，对于每种方案，大多数类别都出现在摘要中，尽管其中两种方案（AZ 和 CoreSC）最初是为完整的期刊文章开发的。所有方案都可以使用机器学习相对可靠地在摘要中识别。此外，当癌症风险评估员被呈现带有方案注释的摘要时，他们比呈现未注释的摘要时能够更快地找到相关信息，即使注释是使用自动分类器生成的。有趣的是，在这种基于用户的评估中，基于节名的粗粒度方案对于 CRA 几乎与最细粒度的 CoreSC 方案一样有用。

结论

我们已经表明，旨在捕获科学文档信息结构的现有方案可应用于生物医学摘要，并可使用足够高的精度自动识别它们，从而有益于生物医学中的现实任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f9a1/3060841/f21555874a42/1471-2105-12-69-1.jpg

相似文献

A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment.在癌症风险评估背景下对文本信息结构模型的比较和基于用户的评估。

BMC Bioinformatics. 2011 Mar 8;12:69. doi: 10.1186/1471-2105-12-69.

Weakly supervised learning of information structure of scientific abstracts--is it accurate enough to benefit real-world tasks in biomedicine?科学文摘信息结构的弱监督学习——其准确性足以有益于生物医学中的实际任务吗？

Bioinformatics. 2011 Nov 15;27(22):3179-85. doi: 10.1093/bioinformatics/btr536. Epub 2011 Sep 22.

Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review.基于主动学习的全文信息结构分析及其在生物医学文献综述中的两个应用。

Bioinformatics. 2013 Jun 1;29(11):1440-7. doi: 10.1093/bioinformatics/btt163. Epub 2013 Apr 5.

The first step in the development of Text Mining technology for Cancer Risk Assessment: identifying and organizing scientific evidence in risk assessment literature.癌症风险评估文本挖掘技术的发展的第一步：识别和组织风险评估文献中的科学证据。

BMC Bioinformatics. 2009 Sep 22;10:303. doi: 10.1186/1471-2105-10-303.

Unsupervised discovery of information structure in biomedical documents.生物医学文献中信息结构的无监督发现。

Bioinformatics. 2015 Apr 1;31(7):1084-92. doi: 10.1093/bioinformatics/btu758. Epub 2014 Nov 18.

Automatically classifying sentences in full-text biomedical articles into Introduction, Methods, Results and Discussion.自动将全文生物医学文章中的句子分类为引言、方法、结果和讨论。

Bioinformatics. 2009 Dec 1;25(23):3174-80. doi: 10.1093/bioinformatics/btp548. Epub 2009 Sep 25.

Automatic recognition of conceptualization zones in scientific articles and two life science applications.科学文章中概念化区域的自动识别及两个生命科学应用。

Bioinformatics. 2012 Apr 1;28(7):991-1000. doi: 10.1093/bioinformatics/bts071. Epub 2012 Feb 8.

Extracting semantically enriched events from biomedical literature.从生物医学文献中提取语义丰富的事件。

BMC Bioinformatics. 2012 May 23;13:108. doi: 10.1186/1471-2105-13-108.

The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text.BioCreative III 的蛋白质-蛋白质相互作用任务：文章的分类/排序和将生物本体论概念链接到全文。

BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S3. doi: 10.1186/1471-2105-12-S8-S3.

Automatic classification of sentences to support Evidence Based Medicine.支持循证医学的句子自动分类。

BMC Bioinformatics. 2011 Mar 29;12 Suppl 2(Suppl 2):S5. doi: 10.1186/1471-2105-12-S2-S5.

引用本文的文献

A survey on clinical natural language processing in the United Kingdom from 2007 to 2022.2007年至2022年英国临床自然语言处理调查。

NPJ Digit Med. 2022 Dec 21;5(1):186. doi: 10.1038/s41746-022-00730-6.

Automating Identification of Multiple Chronic Conditions in Clinical Practice Guidelines.临床实践指南中多种慢性病的自动识别

AMIA Jt Summits Transl Sci Proc. 2015 Mar 25;2015:456-60. eCollection 2015.

Protein interaction network constructing based on text mining and reinforcement learning with application to prostate cancer.基于文本挖掘和强化学习构建蛋白质相互作用网络及其在前列腺癌中的应用

IET Syst Biol. 2015 Aug;9(4):106-12. doi: 10.1049/iet-syb.2014.0050.

Three hybrid classifiers for the detection of emotions in suicide notes.用于检测自杀遗书情绪的三种混合分类器。

Biomed Inform Insights. 2012;5(Suppl. 1):175-84. doi: 10.4137/BII.S8967. Epub 2012 Jan 30.

Text mining for literature review and knowledge discovery in cancer risk assessment and research.文本挖掘在癌症风险评估和研究中的文献综述和知识发现。

PLoS One. 2012;7(4):e33427. doi: 10.1371/journal.pone.0033427. Epub 2012 Apr 12.

Automatic recognition of conceptualization zones in scientific articles and two life science applications.科学文章中概念化区域的自动识别及两个生命科学应用。

Bioinformatics. 2012 Apr 1;28(7):991-1000. doi: 10.1093/bioinformatics/bts071. Epub 2012 Feb 8.

本文引用的文献

TRANSLATING BIOLOGY: TEXT MINING TOOLS THAT WORK.生物学翻译：实用的文本挖掘工具

Pac Symp Biocomput. 2008 Jan 1;13:551-555.

Integrating text mining into the MGI biocuration workflow.将文本挖掘整合到MGI生物编目工作流程中。

Database (Oxford). 2009;2009:bap019. doi: 10.1093/database/bap019. Epub 2009 Nov 21.

BMC Bioinformatics. 2009 Sep 22;10:303. doi: 10.1186/1471-2105-10-303.

Current issues in biomedical text mining and natural language processing.生物医学文本挖掘与自然语言处理中的当前问题。

J Biomed Inform. 2009 Oct;42(5):757-9. doi: 10.1016/j.jbi.2009.09.001. Epub 2009 Sep 6.

Is searching full text more effective than searching abstracts?搜索全文比搜索摘要更有效吗？

BMC Bioinformatics. 2009 Feb 3;10:46. doi: 10.1186/1471-2105-10-46.

Multi-dimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users.生物医学文本的多维分类：致力于为不同用户自动提供实用价值高的文本。

Bioinformatics. 2008 Sep 15;24(18):2086-93. doi: 10.1093/bioinformatics/btn381. Epub 2008 Aug 20.

Natural language processing in aid of FlyBase curators.自然语言处理辅助果蝇数据库（FlyBase）的编辑人员。

BMC Bioinformatics. 2008 Apr 14;9:193. doi: 10.1186/1471-2105-9-193.

Frontiers of biomedical text mining: current progress.生物医学文本挖掘前沿：当前进展

Brief Bioinform. 2007 Sep;8(5):358-75. doi: 10.1093/bib/bbm045. Epub 2007 Oct 30.

LitMiner: integration of library services within a bio-informatics application.LitMiner：生物信息学应用中图书馆服务的整合

Biomed Digit Libr. 2006 Oct 19;3:11. doi: 10.1186/1742-5581-3-11.

Text mining and its potential applications in systems biology.文本挖掘及其在系统生物学中的潜在应用。

Trends Biotechnol. 2006 Dec;24(12):571-9. doi: 10.1016/j.tibtech.2006.10.002. Epub 2006 Oct 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在癌症风险评估背景下对文本信息结构模型的比较和基于用户的评估。

A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献