iTextMine：用于从文献中大规模知识提取的集成文本挖掘系统。

iTextMine: integrated text-mining system for large-scale knowledge extraction from the literature.

机构信息

Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE, USA.

Department of Computer and Information Sciences, University of Delaware, Newark, DE, USA.

出版信息

Database (Oxford). 2018 Jan 1;2018:bay128. doi: 10.1093/database/bay128.

DOI:10.1093/database/bay128

PMID:30576489

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6301332/

Abstract

Numerous efforts have been made for developing text-mining tools to extract information from biomedical text automatically. They have assisted in many biological tasks, such as database curation and hypothesis generation. Text-mining tools are usually different from each other in terms of programming language, system dependency and input/output format. There are few previous works that concern the integration of different text-mining tools and their results from large-scale text processing. In this paper, we describe the iTextMine system with an automated workflow to run multiple text-mining tools on large-scale text for knowledge extraction. We employ parallel processing with dockerized text-mining tools with a standardized JSON output format and implement a text alignment algorithm to solve the text discrepancy for result integration. iTextMine presently integrates four relation extraction tools, which have been used to process all the Medline abstracts and PMC open access full-length articles. The website allows users to browse the text evidence and view integrated results for knowledge discovery through a network view. We demonstrate the utilities of iTextMine with two use cases involving the gene PTEN and breast cancer and the gene SATB1.

摘要

已经做出了许多努力来开发文本挖掘工具，以便自动从生物医学文本中提取信息。它们在许多生物学任务中都有帮助，例如数据库整理和假设生成。文本挖掘工具在编程语言、系统依赖性和输入/输出格式方面通常彼此不同。以前很少有关于整合不同文本挖掘工具及其从大规模文本处理中获得的结果的工作。在本文中，我们描述了 iTextMine 系统，该系统具有自动化工作流程，可在大规模文本上运行多个文本挖掘工具以进行知识提取。我们使用带有标准化 JSON 输出格式的 Docker 化文本挖掘工具进行并行处理，并实现了文本对齐算法来解决结果整合中的文本差异。iTextMine 目前集成了四个关系提取工具，这些工具已用于处理所有 Medline 摘要和 PMC 开放获取全文文章。该网站允许用户通过网络视图浏览文本证据并查看集成结果以进行知识发现。我们通过涉及基因 PTEN 和乳腺癌以及基因 SATB1 的两个用例展示了 iTextMine 的实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c8a/6301332/281694dba951/bay128f1.jpg

相似文献

iTextMine: integrated text-mining system for large-scale knowledge extraction from the literature.iTextMine：用于从文献中大规模知识提取的集成文本挖掘系统。

Database (Oxford). 2018 Jan 1;2018:bay128. doi: 10.1093/database/bay128.

Textpresso Central: a customizable platform for searching, text mining, viewing, and curating biomedical literature.Textpresso 中心：一个可定制的平台，用于搜索、文本挖掘、查看和管理生物医学文献。

BMC Bioinformatics. 2018 Mar 9;19(1):94. doi: 10.1186/s12859-018-2103-8.

The eFIP system for text mining of protein interaction networks of phosphorylated proteins.基于磷酸化蛋白质相互作用网络的文本挖掘的 eFIP 系统。

Database (Oxford). 2012 Dec 5;2012:bas044. doi: 10.1093/database/bas044. Print 2012.

A survey on annotation tools for the biomedical literature.一份关于生物医学文献注释工具的调查。

Brief Bioinform. 2014 Mar;15(2):327-40. doi: 10.1093/bib/bbs084. Epub 2012 Dec 18.

Large-scale event extraction from literature with multi-level gene normalization.从文献中进行多层次基因标准化的大规模事件提取。

PLoS One. 2013 Apr 17;8(4):e55814. doi: 10.1371/journal.pone.0055814. Print 2013.

BioReader: a text mining tool for performing classification of biomedical literature.BioReader：一种文本挖掘工具，用于对生物医学文献进行分类。

BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):57. doi: 10.1186/s12859-019-2607-x.

Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts.利用文本挖掘工具加速文献整理：以 PubTator 在 PubMed 摘要中整理基因为例。

Database (Oxford). 2012 Nov 17;2012:bas041. doi: 10.1093/database/bas041. Print 2012.

MPTM: A tool for mining protein post-translational modifications from literature.MPTM：一种从文献中挖掘蛋白质翻译后修饰的工具。

J Bioinform Comput Biol. 2017 Oct;15(5):1740005. doi: 10.1142/S0219720017400054. Epub 2017 Sep 11.

Biomedical Literature Mining and Its Components.生物医学文献挖掘及其组成部分。

Methods Mol Biol. 2022;2496:1-16. doi: 10.1007/978-1-0716-2305-3_1.

Text mining facilitates database curation - extraction of mutation-disease associations from Bio-medical literature.文本挖掘有助于数据库管理——从生物医学文献中提取突变与疾病的关联。

BMC Bioinformatics. 2015 Jun 6;16:185. doi: 10.1186/s12859-015-0609-x.

引用本文的文献

Novel graph-based machine-learning technique for viral infectious diseases: application to influenza and hepatitis diseases.基于图的新型机器学习技术在病毒性传染病中的应用：流感和肝炎疾病的应用。

Ann Med. 2023;55(2):2304108. doi: 10.1080/07853890.2024.2304108. Epub 2024 Jan 19.

KSFinder-a knowledge graph model for link prediction of novel phosphorylated substrates of kinases.KSFinder——一种用于激酶新磷酸化底物链接预测的知识图谱模型。

PeerJ. 2023 Oct 6;11:e16164. doi: 10.7717/peerj.16164. eCollection 2023.

Graph data science and machine learning for the detection of COVID-19 infection from symptoms.用于从症状检测新冠病毒感染的图数据科学与机器学习

PeerJ Comput Sci. 2023 Apr 10;9:e1333. doi: 10.7717/peerj-cs.1333. eCollection 2023.

Schwann Cells Induce Phenotypic Changes in Oral Cancer Cells.施万细胞诱导口腔癌细胞表型改变。

Adv Biol (Weinh). 2022 Sep;6(9):e2200187. doi: 10.1002/adbi.202200187. Epub 2022 Aug 4.

Automated extraction of genes associated with antibiotic resistance from the biomedical literature.从生物医学文献中自动提取与抗生素耐药性相关的基因。

Database (Oxford). 2022 Jan 29;2022(2022). doi: 10.1093/database/baab077.

An annotated dataset for extracting gene-melanoma relations from scientific literature.从科学文献中提取基因-黑色素瘤关系的带注释数据集。

J Biomed Semantics. 2022 Jan 19;13(1):2. doi: 10.1186/s13326-021-00251-3.

COVID-19 Knowledge Graph from semantic integration of biomedical literature and databases.基于生物医学文献和数据库的语义集成的 COVID-19 知识图谱。

Bioinformatics. 2021 Dec 7;37(23):4597-4598. doi: 10.1093/bioinformatics/btab694.

Large-scale literature mining to assess the relation between anti-cancer drugs and cancer types.大规模文献挖掘评估抗癌药物与癌症类型的关系。

J Transl Med. 2021 Jun 26;19(1):274. doi: 10.1186/s12967-021-02941-z.

emiRIT: a text-mining-based resource for microRNA information.emiRIT：一个基于文本挖掘的 miRNA 信息资源。

Database (Oxford). 2021 May 28;2021. doi: 10.1093/database/baab031.

ScanBious: Survey for Obesity Genes Using PubMed Abstracts and DisGeNET.ScanBious：利用PubMed摘要和DisGeNET进行肥胖基因调查。

J Pers Med. 2021 Mar 29;11(4):246. doi: 10.3390/jpm11040246.

本文引用的文献

eGARD: Extracting associations between genomic anomalies and drug responses from text.eGARD：从文本中提取基因组异常与药物反应之间的关联。

PLoS One. 2017 Dec 20;12(12):e0189663. doi: 10.1371/journal.pone.0189663. eCollection 2017.

iPTMnet: an integrated resource for protein post-translational modification network discovery.iPTMnet：一个用于蛋白质翻译后修饰网络发现的综合资源。

Nucleic Acids Res. 2018 Jan 4;46(D1):D542-D550. doi: 10.1093/nar/gkx1104.

Argo: enabling the development of bespoke workflows and services for disease annotation.阿尔戈：助力开发用于疾病注释的定制工作流程和服务。

Database (Oxford). 2016 May 17;2016. doi: 10.1093/database/baw066. Print 2016.

miRTex: A Text Mining System for miRNA-Gene Relation Extraction.miRTex：一种用于提取miRNA与基因关系的文本挖掘系统。

PLoS Comput Biol. 2015 Sep 25;11(9):e1004391. doi: 10.1371/journal.pcbi.1004391. eCollection 2015.

GNormPlus: An Integrative Approach for Tagging Genes, Gene Families, and Protein Domains.GNormPlus：一种用于标记基因、基因家族和蛋白质结构域的综合方法。

Biomed Res Int. 2015;2015:918710. doi: 10.1155/2015/918710. Epub 2015 Aug 25.

RLIMS-P 2.0: A Generalizable Rule-Based Information Extraction System for Literature Mining of Protein Phosphorylation Information.RLIMS-P 2.0：一种用于蛋白质磷酸化信息文献挖掘的可通用的基于规则的信息提取系统。

IEEE/ACM Trans Comput Biol Bioinform. 2015 Jan-Feb;12(1):17-29. doi: 10.1109/TCBB.2014.2372765.

Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data.《疾病本体论2015年更新：一个通过疾病数据连接生物医学知识的经过扩展和更新的人类疾病数据库》

Nucleic Acids Res. 2015 Jan;43(Database issue):D1071-8. doi: 10.1093/nar/gku1011. Epub 2014 Oct 27.

UniProt: a hub for protein information.通用蛋白质数据库（UniProt）：蛋白质信息中心。

Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. doi: 10.1093/nar/gku989. Epub 2014 Oct 27.

Silencing SATB1 inhibits the malignant phenotype and increases sensitivity of human osteosarcoma U2OS cells to arsenic trioxide.沉默SATB1可抑制人骨肉瘤U2OS细胞的恶性表型并增强其对三氧化二砷的敏感性。

Int J Med Sci. 2014 Oct 2;11(12):1262-9. doi: 10.7150/ijms.10038. eCollection 2014.

Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus.用于注释BioC文集的自然语言处理管道及其在NCBI疾病语料库中的应用。

Database (Oxford). 2014 Jun 16;2014. doi: 10.1093/database/bau056. Print 2014.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

iTextMine：用于从文献中大规模知识提取的集成文本挖掘系统。

iTextMine: integrated text-mining system for large-scale knowledge extraction from the literature.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献