自然语言处理应用中的软件测试与质量保证评估以及一种受语言启发的改进方法。

Assessment of software testing and quality assurance in natural language processing applications and a linguistically inspired approach to improving it.

作者信息

Cohen K Bretonnel, Hunter Lawrence E, Palmer Martha

机构信息

Computational Bioscience Program, University of Colorado School of Medicine, Aurora, Colorado, USA; Department of Linguistics, University of Colorado at Boulder, Boulder, Colorado, USA.

出版信息

Trust Eternal Syst Via Evol Softw Data Knowl (2012). 2013;379:77-90. doi: 10.1007/978-3-642-45260-4_6.

DOI:10.1007/978-3-642-45260-4_6

PMID:34308448

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8300901/

Abstract

Significant progress has been made in addressing the scientific challenges of biomedical text mining. However, the transition from a demonstration of scientific progress to the production of tools on which a broader community can rely requires that fundamental software engineering requirements be addressed. In this paper we characterize the state of biomedical text mining software with respect to software testing and quality assurance. Biomedical natural language processing software was chosen because it frequently specifically claims to offer production-quality services, rather than just research prototypes. We examined twenty web sites offering a variety of text mining services. On each web site, we performed the most basic software test known to us and classified the results. Seven out of twenty web sites returned either bad results or the worst class of results in response to this simple test. We conclude that biomedical natural language processing tools require greater attention to software quality. We suggest a linguistically motivated approach to granular evaluation of natural language processing applications, and show how it can be used to detect performance errors of several systems and to predict overall performance on specific equivalence classes of inputs. We also assess the ability of linguistically-motivated test suites to provide good software testing, as compared to large corpora of naturally-occurring data. We measure code coverage and find that it is considerably higher when even small structured test suites are utilized than when large corpora are used.

摘要

在应对生物医学文本挖掘的科学挑战方面已取得重大进展。然而，从科学进展的展示过渡到生产出更广泛的群体可以依赖的工具，需要解决基本的软件工程要求。在本文中，我们针对软件测试和质量保证描述了生物医学文本挖掘软件的现状。之所以选择生物医学自然语言处理软件，是因为它经常特别宣称提供生产质量的服务，而不仅仅是研究原型。我们考察了提供各种文本挖掘服务的二十个网站。在每个网站上，我们进行了我们所知的最基本的软件测试并对结果进行分类。二十个网站中有七个在回应这个简单测试时返回了错误结果或最差等级的结果。我们得出结论，生物医学自然语言处理工具需要更加关注软件质量。我们提出一种基于语言学的方法来对自然语言处理应用进行粒度评估，并展示它如何用于检测多个系统的性能错误以及预测在特定等效输入类上的整体性能。与大量自然出现的数据语料库相比，我们还评估了基于语言学的测试套件提供良好软件测试的能力。我们测量代码覆盖率，发现即使使用小型结构化测试套件时的代码覆盖率也比使用大型语料库时高得多。

相似文献

Assessment of software testing and quality assurance in natural language processing applications and a linguistically inspired approach to improving it.

Trust Eternal Syst Via Evol Softw Data Knowl (2012). 2013;379:77-90. doi: 10.1007/978-3-642-45260-4_6.

Chapter 16: text mining for translational bioinformatics.

PLoS Comput Biol. 2013 Apr;9(4):e1003044. doi: 10.1371/journal.pcbi.1003044. Epub 2013 Apr 25.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

BioC: a minimalist approach to interoperability for biomedical text processing.

Database (Oxford). 2013 Sep 18;2013:bat064. doi: 10.1093/database/bat064. Print 2013.

A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools.

BMC Bioinformatics. 2012 Aug 17;13:207. doi: 10.1186/1471-2105-13-207.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Macromolecular crowding: chemistry and physics meet biology (Ascona, Switzerland, 10-14 June 2012).

Phys Biol. 2013 Aug;10(4):040301. doi: 10.1088/1478-3975/10/4/040301. Epub 2013 Aug 2.

On the Construction of Multilingual Corpora for Clinical Text Mining.

Stud Health Technol Inform. 2020 Jun 16;270:347-351. doi: 10.3233/SHTI200180.

Text Mining in Biomedical Domain with Emphasis on Document Clustering.

Healthc Inform Res. 2017 Jul;23(3):141-146. doi: 10.4258/hir.2017.23.3.141. Epub 2017 Jul 31.

BioVAE: a pre-trained latent variable language model for biomedical text mining.

Bioinformatics. 2022 Jan 12;38(3):872-874. doi: 10.1093/bioinformatics/btab702.

引用本文的文献

Towards Understanding the Generalization of Medical Text-to-SQL Models and Datasets.

AMIA Annu Symp Proc. 2024 Jan 11;2023:669-678. eCollection 2023.

Methodological Issues in Predicting Pediatric Epilepsy Surgery Candidates Through Natural Language Processing and Machine Learning.

Biomed Inform Insights. 2016 May 22;8:11-8. doi: 10.4137/BII.S38308. eCollection 2016.

本文引用的文献

Efficient extraction of protein-protein interactions from full-text articles.

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jul-Sep;7(3):481-94. doi: 10.1109/TCBB.2010.51.

Concept recognition for extracting protein interaction relations from biomedical text.

Genome Biol. 2008;9 Suppl 2(Suppl 2):S9. doi: 10.1186/gb-2008-9-s2-s9. Epub 2008 Sep 1.

OpenDMAP: an open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression.

BMC Bioinformatics. 2008 Jan 31;9:78. doi: 10.1186/1471-2105-9-78.

A fault model for ontology mapping, alignment, and linking systems.

Pac Symp Biocomput. 2007:233-44.

Frontiers of biomedical text mining: current progress.

Brief Bioinform. 2007 Sep;8(5):358-75. doi: 10.1093/bib/bbm045. Epub 2007 Oct 30.

Retraction.

Science. 2006 Dec 22;314(5807):1875. doi: 10.1126/science.314.5807.1875b.

Scientific publishing. A scientist's nightmare: software problem leads to five retractions.

Science. 2006 Dec 22;314(5807):1856-7. doi: 10.1126/science.314.5807.1856.

GENETAG: a tagged corpus for gene/protein named entity recognition.

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2105-6-S1-S3. Epub 2005 May 24.

Facts from text--is text mining ready to deliver?

PLoS Biol. 2005 Feb;3(2):e65. doi: 10.1371/journal.pbio.0030065.

Tagging gene and protein names in biomedical text.

Bioinformatics. 2002 Aug;18(8):1124-32. doi: 10.1093/bioinformatics/18.8.1124.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

自然语言处理应用中的软件测试与质量保证评估以及一种受语言启发的改进方法。

Assessment of software testing and quality assurance in natural language processing applications and a linguistically inspired approach to improving it.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献