Suppr超能文献

将公民科学应用于从生物医学摘要中提取基因、药物和疾病关系。

Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts.

机构信息

Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, USA.

出版信息

Bioinformatics. 2020 Feb 15;36(4):1226-1233. doi: 10.1093/bioinformatics/btz678.

Abstract

MOTIVATION

Biomedical literature is growing at a rate that outpaces our ability to harness the knowledge contained therein. To mine valuable inferences from the large volume of literature, many researchers use information extraction algorithms to harvest information in biomedical texts. Information extraction is usually accomplished via a combination of manual expert curation and computational methods. Advances in computational methods usually depend on the time-consuming generation of gold standards by a limited number of expert curators. Citizen science is public participation in scientific research. We previously found that citizen scientists are willing and capable of performing named entity recognition of disease mentions in biomedical abstracts, but did not know if this was true with relationship extraction (RE).

RESULTS

In this article, we introduce the Relationship Extraction Module of the web-based application Mark2Cure (M2C) and demonstrate that citizen scientists can perform RE. We confirm the importance of accurate named entity recognition on user performance of RE and identify design issues that impacted data quality. We find that the data generated by citizen scientists can be used to identify relationship types not currently available in the M2C Relationship Extraction Module. We compare the citizen science-generated data with algorithm-mined data and identify ways in which the two approaches may complement one another. We also discuss opportunities for future improvement of this system, as well as the potential synergies between citizen science, manual biocuration and natural language processing.

AVAILABILITY AND IMPLEMENTATION

Mark2Cure platform: https://mark2cure.org; Mark2Cure source code: https://github.com/sulab/mark2cure; and data and analysis code for this article: https://github.com/gtsueng/M2C_rel_nb.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

生物医学文献的增长速度超过了我们利用其中包含的知识的能力。为了从大量文献中挖掘有价值的推论,许多研究人员使用信息提取算法从生物医学文本中提取信息。信息提取通常通过手动专家策展和计算方法的组合来完成。计算方法的进步通常取决于少数专家策展人耗时生成黄金标准。公民科学是公众参与科学研究。我们之前发现,公民科学家愿意并有能力对生物医学摘要中的疾病提及进行命名实体识别,但不知道这是否适用于关系提取(RE)。

结果

在本文中,我们介绍了基于网络的应用程序 Mark2Cure(M2C)的关系提取模块,并证明公民科学家可以进行 RE。我们确认准确的命名实体识别对用户执行 RE 的性能的重要性,并确定影响数据质量的设计问题。我们发现,公民科学家生成的数据可用于识别当前在 M2C 关系提取模块中不可用的关系类型。我们比较了公民科学生成的数据和算法挖掘的数据,并确定了这两种方法可以相互补充的方式。我们还讨论了改进此系统的未来机会,以及公民科学、手动生物注释和自然语言处理之间的潜在协同作用。

可用性和实现

Mark2Cure 平台:https://mark2cure.org;Mark2Cure 源代码:https://github.com/sulab/mark2cure;本文的数据和分析代码:https://github.com/gtsueng/M2C_rel_nb。

补充信息

补充数据可在生物信息学在线获得。

相似文献

引用本文的文献

1
The academic impact of Open Science: a scoping review.开放科学的学术影响:一项范围综述
R Soc Open Sci. 2025 Mar 5;12(3):241248. doi: 10.1098/rsos.241248. eCollection 2025 Mar.
2
Machine learning in healthcare citizen science: A scoping review.医疗保健公民科学中的机器学习:一项范围综述。
Int J Med Inform. 2025 Mar;195:105766. doi: 10.1016/j.ijmedinf.2024.105766. Epub 2024 Dec 19.

本文引用的文献

7
8
Semantic annotation in biomedicine: the current landscape.生物医学中的语义标注:现状
J Biomed Semantics. 2017 Sep 22;8(1):44. doi: 10.1186/s13326-017-0153-x.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验