文本即数据：利用基于文本的特征进行蛋白质表征及其特性的计算预测。

Text as data: using text-based features for proteins representation and for computational prediction of their characteristics.

作者信息

Shatkay Hagit, Brady Scott, Wong Andrew

机构信息

Dept. of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA; Delaware Biotechnology Institute, University of Delaware, Newark, DE 19711, USA; Computational Biology and Machine Learning Lab, School of Computing, Queen's University, Kingston, ON K7L 3N6, Canada.

School of Medicine, University of Toronto, Toronto, ON M5S 1A8, Canada; Computational Biology and Machine Learning Lab, School of Computing, Queen's University, Kingston, ON K7L 3N6, Canada.

出版信息

Methods. 2015 Mar;74:54-64. doi: 10.1016/j.ymeth.2014.10.027. Epub 2014 Nov 15.

DOI:10.1016/j.ymeth.2014.10.027

PMID:25448299

Abstract

The current era of large-scale biology is characterized by a fast-paced growth in the number of sequenced genomes and, consequently, by a multitude of identified proteins whose function has yet to be determined. Simultaneously, any known or postulated information concerning genes and proteins is part of the ever-growing published scientific literature, which is expanding at a rate of over a million new publications per year. Computational tools that attempt to automatically predict and annotate protein characteristics, such as function and localization patterns, are being developed along with systems that aim to support the process via text mining. Most work on protein characterization focuses on features derived directly from protein sequence data. Protein-related work that does aim to utilize the literature typically concentrates on extracting specific facts (e.g., protein interactions) from text. In the past few years we have taken a different route, treating the literature as a source of text-based features, which can be employed just as sequence-based protein-features were used in earlier work, for predicting protein subcellular location and possibly also function. We discuss here in detail the overall approach, along with results from work we have done in this area demonstrating the value of this method and its potential use.

摘要

当前大规模生物学时代的特点是测序基因组数量快速增长，因此也有大量已鉴定但功能尚未确定的蛋白质。与此同时，任何有关基因和蛋白质的已知或假设信息都是不断增长的已发表科学文献的一部分，该文献正以每年超过一百万篇新出版物的速度增长。试图自动预测和注释蛋白质特征（如功能和定位模式）的计算工具正在与旨在通过文本挖掘支持该过程的系统一起开发。大多数蛋白质表征工作都集中在直接从蛋白质序列数据中衍生的特征上。旨在利用文献的蛋白质相关工作通常集中于从文本中提取特定事实（例如蛋白质相互作用）。在过去几年中，我们采取了不同的方法，将文献视为基于文本的特征来源，就像早期工作中使用基于序列的蛋白质特征一样，可用于预测蛋白质亚细胞定位，甚至可能还有功能。我们在此详细讨论整体方法，以及我们在该领域所做工作的结果，这些结果证明了该方法的价值及其潜在用途。

相似文献

Text as data: using text-based features for proteins representation and for computational prediction of their characteristics.文本即数据：利用基于文本的特征进行蛋白质表征及其特性的计算预测。

Methods. 2015 Mar;74:54-64. doi: 10.1016/j.ymeth.2014.10.027. Epub 2014 Nov 15.

Protein function prediction using text-based features extracted from the biomedical literature: the CAFA challenge.基于生物医学文献中提取的文本特征进行蛋白质功能预测：CAFA 挑战赛。

BMC Bioinformatics. 2013;14 Suppl 3(Suppl 3):S14. doi: 10.1186/1471-2105-14-S3-S14. Epub 2013 Feb 28.

Protein-protein interaction predictions using text mining methods.使用文本挖掘方法进行蛋白质-蛋白质相互作用预测。

Methods. 2015 Mar;74:47-53. doi: 10.1016/j.ymeth.2014.10.026. Epub 2014 Oct 28.

Terminological resources for text mining over biomedical scientific literature.生物医学文献文本挖掘的术语资源。

Artif Intell Med. 2011 Jun;52(2):107-14. doi: 10.1016/j.artmed.2011.04.011. Epub 2011 Jun 11.

Integrating protein-protein interactions and text mining for protein function prediction.整合蛋白质-蛋白质相互作用和文本挖掘进行蛋白质功能预测。

BMC Bioinformatics. 2008 Jul 22;9 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2105-9-S8-S2.

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts.全面且定量地比较了 1500 万篇全文文章及其相应摘要中的文本挖掘。

PLoS Comput Biol. 2018 Feb 15;14(2):e1005962. doi: 10.1371/journal.pcbi.1005962. eCollection 2018 Feb.

Text mining in livestock animal science: introducing the potential of text mining to animal sciences.文本挖掘在畜牧动物科学中的应用：介绍文本挖掘在动物科学中的应用潜力。

J Anim Sci. 2012 Oct;90(10):3666-76. doi: 10.2527/jas.2011-4841. Epub 2012 Jun 4.

BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale.生物概念向量：在大规模上创建和评估基于文献的生物医学概念嵌入。

PLoS Comput Biol. 2020 Apr 23;16(4):e1007617. doi: 10.1371/journal.pcbi.1007617. eCollection 2020 Apr.

DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation.DeepText2GO：利用深度语义文本表示提高大规模蛋白质功能预测。

Methods. 2018 Aug 1;145:82-90. doi: 10.1016/j.ymeth.2018.05.026. Epub 2018 Jun 6.

Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies?文本中的事实：文本挖掘能否助力利用本体对基因产物进行大规模高质量人工编目？

Brief Bioinform. 2008 Nov;9(6):466-78. doi: 10.1093/bib/bbn043. Epub 2008 Dec 6.

引用本文的文献

DES-ROD: Exploring Literature to Develop New Links between RNA Oxidation and Human Diseases.DES-ROD：探索 RNA 氧化与人类疾病之间新联系的文献。

Oxid Med Cell Longev. 2020 Mar 27;2020:5904315. doi: 10.1155/2020/5904315. eCollection 2020.

Literature-Based Enrichment Insights into Redox Control of Vascular Biology.基于文献的血管生物学氧化还原控制的深入见解。

Oxid Med Cell Longev. 2019 May 16;2019:1769437. doi: 10.1155/2019/1769437. eCollection 2019.

Predicting protein functions by applying predicate logic to biomedical literature.通过将谓词逻辑应用于生物医学文献来预测蛋白质功能。

BMC Bioinformatics. 2019 Feb 8;20(1):71. doi: 10.1186/s12859-019-2594-y.

The research on gene-disease association based on text-mining of PubMed.基于 PubMed 文本挖掘的基因-疾病关联研究。

BMC Bioinformatics. 2018 Feb 7;19(1):37. doi: 10.1186/s12859-018-2048-y.

Text Mining for Protein Docking.用于蛋白质对接的文本挖掘

PLoS Comput Biol. 2015 Dec 9;11(12):e1004630. doi: 10.1371/journal.pcbi.1004630. eCollection 2015 Dec.

Evaluating a variety of text-mined features for automatic protein function prediction with GOstruct.使用GOstruct评估多种文本挖掘特征以进行自动蛋白质功能预测。

J Biomed Semantics. 2015 Mar 18;6:9. doi: 10.1186/s13326-015-0006-4. eCollection 2015.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

文本即数据：利用基于文本的特征进行蛋白质表征及其特性的计算预测。

Text as data: using text-based features for proteins representation and for computational prediction of their characteristics.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献