Nam Hee-Jo, Yamada Ryota, Park Hyun-Seok
Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea.
Fuku Corporation, Tokyo 113-0033, Japan.
Genomics Inform. 2020 Jun;18(2):e13. doi: 10.5808/GI.2020.18.2.e13. Epub 2020 Jun 16.
The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon 6 (BLAH6), we experimented with converting, annotating, and updating 301 PMC full-text articles of Genomics & Informatics using PubAnnotation, a system that provides a convenient way to add PMC publications based on PMCID. Thus, this review aims to provide a tutorial overview of practicing the iterative task of named entity recognition with the PubAnnotation/PubDictionaries/TextAE ecosystem. We also describe developing a conversion tool between the Genia tagger output and the JSON format of PubAnnotation during the hackathon.
《基因组学与信息学》全文语料库的原型版本最近已存档于GitHub仓库。第10卷至第17卷的全文出版物也可直接从PubMed Central(PMC)以XML文件形式下载。在生物医学链接注释黑客马拉松6(BLAH6)期间,我们尝试使用PubAnnotation对《基因组学与信息学》的301篇PMC全文文章进行转换、注释和更新,PubAnnotation是一个基于PMCID添加PMC出版物的便捷系统。因此,本综述旨在提供一个教程概述,介绍如何使用PubAnnotation/PubDictionaries/TextAE生态系统执行命名实体识别的迭代任务。我们还描述了在黑客马拉松期间开发Genia标记器输出与PubAnnotation的JSON格式之间的转换工具。