• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物信息学中的自然语言处理技术综述

Survey of Natural Language Processing Techniques in Bioinformatics.

作者信息

Zeng Zhiqiang, Shi Hua, Wu Yun, Hong Zhiling

机构信息

College of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China.

Software School, Xiamen University, Xiamen 361005, China.

出版信息

Comput Math Methods Med. 2015;2015:674296. doi: 10.1155/2015/674296. Epub 2015 Oct 7.

DOI:10.1155/2015/674296
PMID:26525745
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4615216/
Abstract

Informatics methods, such as text mining and natural language processing, are always involved in bioinformatics research. In this study, we discuss text mining and natural language processing methods in bioinformatics from two perspectives. First, we aim to search for knowledge on biology, retrieve references using text mining methods, and reconstruct databases. For example, protein-protein interactions and gene-disease relationship can be mined from PubMed. Then, we analyze the applications of text mining and natural language processing techniques in bioinformatics, including predicting protein structure and function, detecting noncoding RNA. Finally, numerous methods and applications, as well as their contributions to bioinformatics, are discussed for future use by text mining and natural language processing researchers.

摘要

信息学方法,如文本挖掘和自然语言处理,一直都参与到生物信息学研究中。在本研究中,我们从两个角度讨论生物信息学中的文本挖掘和自然语言处理方法。首先,我们旨在搜索生物学知识,使用文本挖掘方法检索参考文献,并重建数据库。例如,可以从PubMed中挖掘蛋白质-蛋白质相互作用和基因-疾病关系。然后,我们分析文本挖掘和自然语言处理技术在生物信息学中的应用,包括预测蛋白质结构和功能、检测非编码RNA。最后,讨论了众多方法和应用及其对生物信息学的贡献,以供文本挖掘和自然语言处理研究人员未来使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/919b/4615216/7b0e96f39627/CMMM2015-674296.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/919b/4615216/7b0e96f39627/CMMM2015-674296.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/919b/4615216/7b0e96f39627/CMMM2015-674296.001.jpg

相似文献

1
Survey of Natural Language Processing Techniques in Bioinformatics.生物信息学中的自然语言处理技术综述
Comput Math Methods Med. 2015;2015:674296. doi: 10.1155/2015/674296. Epub 2015 Oct 7.
2
Community challenges in biomedical text mining over 10 years: success, failure and the future.十年来生物医学文本挖掘中的社区挑战:成功、失败与未来。
Brief Bioinform. 2016 Jan;17(1):132-44. doi: 10.1093/bib/bbv024. Epub 2015 May 1.
3
Zsyntax: a formal language for molecular biology with projected applications in text mining and biological prediction.Zsyntax:一种用于分子生物学的形式语言,预计可应用于文本挖掘和生物预测。
PLoS One. 2010 Mar 3;5(3):e9511. doi: 10.1371/journal.pone.0009511.
4
Text mining.文本挖掘
Methods Mol Biol. 2008;453:471-91. doi: 10.1007/978-1-60327-429-6_25.
5
Metabolic Pathway Mining.代谢途径挖掘
Methods Mol Biol. 2017;1526:139-158. doi: 10.1007/978-1-4939-6613-4_8.
6
Biomedical named entity recognition and linking datasets: survey and our recent development.生物医学命名实体识别与链接数据集:综述及我们的最新进展
Brief Bioinform. 2020 Dec 1;21(6):2219-2238. doi: 10.1093/bib/bbaa054.
7
Improved chemical text mining of patents with infinite dictionaries and automatic spelling correction.无限词典和自动拼写纠错改进专利的化学文本挖掘。
J Chem Inf Model. 2012 Jan 23;52(1):51-62. doi: 10.1021/ci200463r. Epub 2011 Dec 28.
8
PubMed-EX: a web browser extension to enhance PubMed search with text mining features.PubMed-EX:一款网络浏览器扩展,利用文本挖掘功能增强 PubMed 检索。
Bioinformatics. 2009 Nov 15;25(22):3031-2. doi: 10.1093/bioinformatics/btp475. Epub 2009 Aug 4.
9
An Overview of Biomolecular Event Extraction from Scientific Documents.科学文献中生物分子事件提取概述
Comput Math Methods Med. 2015;2015:571381. doi: 10.1155/2015/571381. Epub 2015 Oct 26.
10
Sequence and structure analysis of noncoding RNAs.非编码RNA的序列与结构分析
Methods Mol Biol. 2010;609:285-306. doi: 10.1007/978-1-60327-241-4_17.

引用本文的文献

1
Bridging artificial intelligence and biological sciences: a comprehensive review of large language models in bioinformatics.连接人工智能与生物科学:生物信息学中大型语言模型的全面综述
Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf357.
2
PLM-ATG: Identification of Autophagy Proteins by Integrating Protein Language Model Embeddings with PSSM-Based Features.PLM-ATG:通过将蛋白质语言模型嵌入与基于位置特异性得分矩阵的特征相结合来鉴定自噬蛋白
Molecules. 2025 Apr 10;30(8):1704. doi: 10.3390/molecules30081704.
3
Feature selection enhances peptide binding predictions for TCR-specific interactions.

本文引用的文献

1
PseDNA-Pro: DNA-Binding Protein Identification by Combining Chou's PseAAC and Physicochemical Distance Transformation.PseDNA-Pro:结合周氏伪氨基酸组成和物理化学距离变换的DNA结合蛋白鉴定方法
Mol Inform. 2015 Jan;34(1):8-17. doi: 10.1002/minf.201400025. Epub 2014 Sep 26.
2
Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation.结合周氏伪氨基酸组成和基于轮廓的蛋白质表示法进行蛋白质远程同源性检测。
Mol Inform. 2013 Oct;32(9-10):775-82. doi: 10.1002/minf.201300084. Epub 2013 Jul 24.
3
Protein Function Prediction with Incomplete Annotations.
特征选择增强了对TCR特异性相互作用的肽结合预测。
Front Immunol. 2025 Jan 23;15:1510435. doi: 10.3389/fimmu.2024.1510435. eCollection 2024.
4
Advancements and Applications of Artificial Intelligence in Pharmaceutical Sciences: A Comprehensive Review.人工智能在制药科学中的进展与应用:综述
Iran J Pharm Res. 2024 Oct 15;23(1):e150510. doi: 10.5812/ijpr-150510. eCollection 2024 Jan-Dec.
5
Feature Selection Enhances Peptide Binding Predictions for TCR-Specific Interactions.特征选择增强了TCR特异性相互作用的肽结合预测。
bioRxiv. 2024 Oct 13:2024.10.11.617901. doi: 10.1101/2024.10.11.617901.
6
Artificial intelligence and bioinformatics: a journey from traditional techniques to smart approaches.人工智能与生物信息学:从传统技术到智能方法的历程。
Gastroenterol Hepatol Bed Bench. 2024;17(3):241-252. doi: 10.22037/ghfbb.v17i3.2977.
7
Comprehensive Research on Druggable Proteins: From PSSM to Pre-Trained Language Models.可成药蛋白的综合研究:从位置特异性得分矩阵到预训练语言模型
Int J Mol Sci. 2024 Apr 19;25(8):4507. doi: 10.3390/ijms25084507.
8
AntiBP3: A Method for Predicting Antibacterial Peptides against Gram-Positive/Negative/Variable Bacteria.AntiBP3:一种预测抗革兰氏阳性/阴性/可变细菌抗菌肽的方法。
Antibiotics (Basel). 2024 Feb 8;13(2):168. doi: 10.3390/antibiotics13020168.
9
Text mining of CHO bioprocess bibliome: Topic modeling and document classification.生物工艺文献组学的文本挖掘:主题建模与文献分类。
PLoS One. 2023 Apr 6;18(4):e0274042. doi: 10.1371/journal.pone.0274042. eCollection 2023.
10
Transcription factor-based biosensors for screening and dynamic regulation.用于筛选和动态调控的基于转录因子的生物传感器。
Front Bioeng Biotechnol. 2023 Feb 6;11:1118702. doi: 10.3389/fbioe.2023.1118702. eCollection 2023.
利用不完整注释进行蛋白质功能预测。
IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):579-91. doi: 10.1109/TCBB.2013.142.
4
Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods.基于社交网络分析方法的微小RNA-疾病关联预测
Biomed Res Int. 2015;2015:810514. doi: 10.1155/2015/810514. Epub 2015 Jul 26.
5
Community challenges in biomedical text mining over 10 years: success, failure and the future.十年来生物医学文本挖掘中的社区挑战:成功、失败与未来。
Brief Bioinform. 2016 Jan;17(1):132-44. doi: 10.1093/bib/bbv024. Epub 2015 May 1.
6
Crowdsourcing in biomedicine: challenges and opportunities.生物医学中的众包:挑战与机遇。
Brief Bioinform. 2016 Jan;17(1):23-32. doi: 10.1093/bib/bbv021. Epub 2015 Apr 17.
7
Identification of real microRNA precursors with a pseudo structure status composition approach.采用伪结构状态组成方法鉴定真实的微小RNA前体。
PLoS One. 2015 Mar 30;10(3):e0121501. doi: 10.1371/journal.pone.0121501. eCollection 2015.
8
miRNA-dis: microRNA precursor identification based on distance structure status pairs.miRNA-dis:基于距离结构状态对的微小RNA前体识别
Mol Biosyst. 2015 Apr;11(4):1194-204. doi: 10.1039/c5mb00050e.
9
SynBioLGDB: a resource for experimentally validated logic gates in synthetic biology.合成生物学逻辑门数据库(SynBioLGDB):一个用于合成生物学中经过实验验证的逻辑门的资源库。
Sci Rep. 2015 Jan 28;5:8090. doi: 10.1038/srep08090.
10
repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects.repDNA:一个 Python 包,通过结合用户定义的物理化学性质和序列顺序效应,为 DNA 序列生成各种模式的特征向量。
Bioinformatics. 2015 Apr 15;31(8):1307-9. doi: 10.1093/bioinformatics/btu820. Epub 2014 Dec 10.