用于挖掘生物医学文献的公民科学。

Citizen Science for Mining the Biomedical Literature.

作者信息

Tsueng Ginger, Nanis Steven M, Fouquier Jennifer, Good Benjamin M, Su Andrew I

机构信息

Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA.

出版信息

Citiz Sci. 2016;1(2). doi: 10.5334/cstp.56. Epub 2016 Dec 31.

DOI:10.5334/cstp.56

PMID:30416754

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6226017/

Abstract

Biomedical literature represents one of the largest and fastest growing collections of unstructured biomedical knowledge. Finding critical information buried in the literature can be challenging. To extract information from free-flowing text, researchers need to: 1. identify the entities in the text (named entity recognition), 2. apply a standardized vocabulary to these entities (normalization), and 3. identify how entities in the text are related to one another (relationship extraction). Researchers have primarily approached these information extraction tasks through manual expert curation and computational methods. We have previously demonstrated that named entity recognition (NER) tasks can be crowdsourced to a group of non-experts via the paid microtask platform, Amazon Mechanical Turk (AMT), and can dramatically reduce the cost and increase the throughput of biocuration efforts. However, given the size of the biomedical literature, even information extraction via paid microtask platforms is not scalable. With our web-based application Mark2Cure (http://mark2cure.org), we demonstrate that NER tasks also can be performed by volunteer citizen scientists with high accuracy. We apply metrics from the Zooniverse Matrices of Citizen Science Success and provide the results here to serve as a basis of comparison for other citizen science projects. Further, we discuss design considerations, issues, and the application of analytics for successfully moving a crowdsourcing workflow from a paid microtask platform to a citizen science platform. To our knowledge, this study is the first application of citizen science to a natural language processing task.

摘要

生物医学文献是最大且增长最快的非结构化生物医学知识集合之一。在文献中找到隐藏的关键信息可能具有挑战性。为了从流畅的文本中提取信息，研究人员需要：1. 识别文本中的实体（命名实体识别）；2. 对这些实体应用标准化词汇（归一化）；3. 识别文本中的实体之间如何相互关联（关系提取）。研究人员主要通过人工专家编纂和计算方法来处理这些信息提取任务。我们之前已经证明，命名实体识别（NER）任务可以通过付费微任务平台亚马逊土耳其机器人（AMT）众包给一群非专家，并且可以显著降低成本并提高生物编目工作的通量。然而，鉴于生物医学文献的规模，即使通过付费微任务平台进行信息提取也无法扩展。通过我们基于网络的应用程序Mark2Cure（http://mark2cure.org），我们证明了NER任务也可以由志愿公民科学家高精度地执行。我们应用了来自公民科学成功的Zooniverse矩阵的指标，并在此提供结果，作为其他公民科学项目的比较基础。此外，我们讨论了设计考虑因素、问题以及分析方法的应用，以便成功地将众包工作流程从付费微任务平台转移到公民科学平台。据我们所知，这项研究是公民科学在自然语言处理任务中的首次应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/087e/6226017/c3a6c900364b/nihms-992547-f0001.jpg

相似文献

Citizen Science for Mining the Biomedical Literature.用于挖掘生物医学文献的公民科学。

Citiz Sci. 2016;1(2). doi: 10.5334/cstp.56. Epub 2016 Dec 31.

Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts.将公民科学应用于从生物医学摘要中提取基因、药物和疾病关系。

Bioinformatics. 2020 Feb 15;36(4):1226-1233. doi: 10.1093/bioinformatics/btz678.

Crowdsourcing image segmentation for deep learning: integrated platform for citizen science, paid microtask, and gamification.众包图像分割的深度学习：公民科学、付费微任务和游戏化的集成平台。

Biomed Tech (Berl). 2023 Dec 26;69(3):293-305. doi: 10.1515/bmt-2023-0148. Print 2024 Jun 25.

Microtask crowdsourcing for disease mention annotation in PubMed abstracts.用于在PubMed摘要中进行疾病提及标注的微任务众包。

Pac Symp Biocomput. 2015:282-93.

Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.利用词向量将领域知识融入化学和生物医学命名实体识别。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.

Challenges in clinical natural language processing for automated disorder normalization.临床自然语言处理中自动疾病标准化的挑战。

J Biomed Inform. 2015 Oct;57:28-37. doi: 10.1016/j.jbi.2015.07.010. Epub 2015 Jul 14.

Online citizen science with the Zooniverse for analysis of biological volumetric data.利用 Zooniverse 开展在线公民科学，分析生物体积数据。

Histochem Cell Biol. 2023 Sep;160(3):253-276. doi: 10.1007/s00418-023-02204-6. Epub 2023 Jun 7.

Integrating text mining into the MGI biocuration workflow.将文本挖掘整合到MGI生物编目工作流程中。

Database (Oxford). 2009;2009:bap019. doi: 10.1093/database/bap019. Epub 2009 Nov 21.

FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining.FamPlex：生物医学文本挖掘中人类蛋白质家族和复合物的实体识别和关系解析资源。

BMC Bioinformatics. 2018 Jun 28;19(1):248. doi: 10.1186/s12859-018-2211-5.

Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.结合命名实体识别和未知词处理的本体事件抽取的主动学习

J Biomed Semantics. 2016 Apr 27;7:22. doi: 10.1186/s13326-016-0059-z. eCollection 2016.

引用本文的文献

Machine learning in healthcare citizen science: A scoping review.医疗保健公民科学中的机器学习：一项范围综述。

Int J Med Inform. 2025 Mar;195:105766. doi: 10.1016/j.ijmedinf.2024.105766. Epub 2024 Dec 19.

Outbreak.info Research Library: a standardized, searchable platform to discover and explore COVID-19 resources.爆发信息研究图书馆：一个标准化、可搜索的平台，用于发现和探索 COVID-19 资源。

Nat Methods. 2023 Apr;20(4):536-540. doi: 10.1038/s41592-023-01770-w. Epub 2023 Feb 23.

Outbreak.info Research Library: A standardized, searchable platform to discover and explore COVID-19 resources.疫情信息研究图书馆：一个用于发现和探索新冠病毒资源的标准化、可搜索平台。

bioRxiv. 2022 Dec 7:2022.01.20.477133. doi: 10.1101/2022.01.20.477133.

Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms' representation.生物医学知识编目中的人与机器：肥厚型心肌病分子机制的呈现

BioData Min. 2021 Oct 2;14(1):45. doi: 10.1186/s13040-021-00279-2.

Scientific Discovery Games for Biomedical Research.用于生物医学研究的科学发现游戏。

Annu Rev Biomed Data Sci. 2019 Jul;2(1):253-279. doi: 10.1146/annurev-biodatasci-072018-021139.

Research data management in health and biomedical citizen science: practices and prospects.健康与生物医学公民科学中的研究数据管理：实践与前景

JAMIA Open. 2019 Dec 9;3(1):113-125. doi: 10.1093/jamiaopen/ooz052. eCollection 2020 Apr.

Applying citizen science to gene, drug and disease relationship extraction from biomedical abstracts.将公民科学应用于从生物医学摘要中提取基因、药物和疾病关系。

Bioinformatics. 2020 Feb 15;36(4):1226-1233. doi: 10.1093/bioinformatics/btz678.

Aligning Needs: Integrating Citizen Science Efforts into Schools Through Service Requirements.需求对接：通过服务要求将公民科学活动融入学校

Hum Comput (Fairfax). 2019;6(1):56-82.

Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning.众包图像分析在植物表型组学中的应用，为机器学习生成地面实况数据。

PLoS Comput Biol. 2018 Jul 30;14(7):e1006337. doi: 10.1371/journal.pcbi.1006337. eCollection 2018 Jul.

An annotated corpus with nanomedicine and pharmacokinetic parameters.一个带有纳米医学和药代动力学参数的注释语料库。

Int J Nanomedicine. 2017 Oct 12;12:7519-7527. doi: 10.2147/IJN.S137117. eCollection 2017.

本文引用的文献

Vaginal Prostate Specific Antigen (PSA) Is a Useful Biomarker of Semen Exposure Among HIV-Infected Ugandan Women.阴道前列腺特异性抗原（PSA）是乌干达感染艾滋病毒女性精液暴露的有用生物标志物。

AIDS Behav. 2017 Jul;21(7):2141-2146. doi: 10.1007/s10461-016-1433-7.

HUMAN COMPUTATION. The power of crowds.人类计算。群体的力量。

Science. 2016 Jan 1;351(6268):32-3. doi: 10.1126/science.aad6499.

RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information.RLIMS-P 2.0：一种用于蛋白质磷酸化信息文献挖掘的可通用的基于规则的信息提取系统。

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):17-29. doi: 10.1109/TCBB.2014.2372765.

Disease activity in psoriatic arthritis (PsA): defining remission and treatment success using the DAPSA score.银屑病关节炎（PsA）的疾病活动度：使用 DAPSA 评分定义缓解和治疗成功。

Ann Rheum Dis. 2016 May;75(5):811-8. doi: 10.1136/annrheumdis-2015-207507. Epub 2015 Aug 12.

The CHEMDNER corpus of chemicals and drugs and its annotation principles.CHEMDNER 化学物质和药物语料库及其标注原则。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.

CheNER: a tool for the identification of chemical entities and their classes in biomedical literature.CheNER：一个用于在生物医学文献中识别化学实体及其类别的工具。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S15. doi: 10.1186/1758-2946-7-S1-S15. eCollection 2015.

Microtask crowdsourcing for disease mention annotation in PubMed abstracts.用于在PubMed摘要中进行疾病提及标注的微任务众包。

Pac Symp Biocomput. 2015:282-93.

NCBI disease corpus: a resource for disease name recognition and concept normalization.NCBI疾病语料库：一种用于疾病名称识别和概念规范化的资源。

J Biomed Inform. 2014 Feb;47:1-10. doi: 10.1016/j.jbi.2013.12.006. Epub 2014 Jan 3.

Process, not product: investigating recommendations for improving citizen science "success".过程，而非结果：探究提高公民科学“成功”的建议。

PLoS One. 2013 May 15;8(5):e64079. doi: 10.1371/journal.pone.0064079. Print 2013.

Algorithm discovery by protein folding game players.通过蛋白质折叠游戏玩家发现算法。

Proc Natl Acad Sci U S A. 2011 Nov 22;108(47):18949-53. doi: 10.1073/pnas.1115898108. Epub 2011 Nov 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于挖掘生物医学文献的公民科学。

Citizen Science for Mining the Biomedical Literature.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献