National Center for Biotechnology Information (NCBI), U.S. Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA.
Bioinformatics. 2019 Sep 15;35(18):3533-3535. doi: 10.1093/bioinformatics/btz070.
Interest in text mining full-text biomedical research articles is growing. To facilitate automated processing of nearly 3 million full-text articles (in PubMed Central® Open Access and Author Manuscript subsets) and to improve interoperability, we convert these articles to BioC, a community-driven simple data structure in either XML or JavaScript Object Notation format for conveniently sharing text and annotations.
The resultant articles can be downloaded via both File Transfer Protocol for bulk access and a Web API for updates or a more focused collection. Since the availability of the Web API in 2017, our BioC collection has been widely used by the research community.
https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PMC/.
对全文生物医学研究文章的文本挖掘的兴趣日益浓厚。为了方便自动处理近 300 万篇全文文章(在 PubMed Central®开放获取和作者手稿子集中)并提高互操作性,我们将这些文章转换为 BioC,这是一种社区驱动的简单数据结构,采用 XML 或 JavaScript 对象表示法格式,用于方便地共享文本和注释。
可以通过文件传输协议(用于批量访问)和 Web API(用于更新或更集中的集合)下载生成的文章。自 2017 年 Web API 可用以来,我们的 BioC 集合已被研究界广泛使用。
https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/BioC-PMC/。