肽库（PepBank）——一个基于序列文本挖掘和公共肽数据源的肽数据库。

PepBank--a database of peptides based on sequence text mining and public peptide data sources.

作者信息

Shtatland Timur, Guettler Daniel, Kossodo Misha, Pivovarov Misha, Weissleder Ralph

机构信息

Center for Molecular Imaging Research, Massachusetts General Hospital, Harvard Medical School, Bldg, 149, 13th Street, Room 5406, Charlestown, MA 02129, USA.

出版信息

BMC Bioinformatics. 2007 Aug 1;8:280. doi: 10.1186/1471-2105-8-280.

DOI:10.1186/1471-2105-8-280

PMID:17678535

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1976427/

Abstract

BACKGROUND

Peptides are important molecules with diverse biological functions and biomedical uses. To date, there does not exist a single, searchable archive for peptide sequences or associated biological data. Rather, peptide sequences still have to be mined from abstracts and full-length articles, and/or obtained from the fragmented public sources.

DESCRIPTION

We have constructed a new database (PepBank), which at the time of writing contains a total of 19,792 individual peptide entries. The database has a web-based user interface with a simple, Google-like search function, advanced text search, and BLAST and Smith-Waterman search capabilities. The major source of peptide sequence data comes from text mining of MEDLINE abstracts. Another component of the database is the peptide sequence data from public sources (ASPD and UniProt). An additional, smaller part of the database is manually curated from sets of full text articles and text mining results. We show the utility of the database in different examples of affinity ligand discovery.

CONCLUSION

We have created and maintain a database of peptide sequences. The database has biological and medical applications, for example, to predict the binding partners of biologically interesting peptides, to develop peptide based therapeutic or diagnostic agents, or to predict molecular targets or binding specificities of peptides resulting from phage display selection. The database is freely available on http://pepbank.mgh.harvard.edu/, and the text mining source code (Peptide::Pubmed) is freely available above as well as on CPAN (http://www.cpan.org/).

摘要

背景

肽是具有多种生物学功能和生物医学用途的重要分子。迄今为止，尚未存在一个可搜索的单一肽序列或相关生物学数据存档库。相反，肽序列仍需从摘要和全文文章中挖掘，和/或从零散的公共资源中获取。

描述

我们构建了一个新数据库（PepBank），在撰写本文时，该数据库共包含19,792个单独的肽条目。该数据库具有基于网络的用户界面，具备简单的类似谷歌的搜索功能、高级文本搜索以及BLAST和史密斯-沃特曼搜索功能。肽序列数据的主要来源是对MEDLINE摘要的文本挖掘。数据库的另一个组成部分是来自公共资源（ASPD和UniProt）的肽序列数据。数据库中另外一小部分是从全文文章集和文本挖掘结果中手动整理而来。我们在亲和力配体发现的不同示例中展示了该数据库的实用性。

结论

我们创建并维护了一个肽序列数据库。该数据库具有生物学和医学应用，例如，预测具有生物学意义的肽的结合伙伴、开发基于肽的治疗或诊断试剂，或预测噬菌体展示筛选产生的肽的分子靶点或结合特异性。该数据库可在http://pepbank.mgh.harvard.edu/上免费获取，文本挖掘源代码（Peptide::Pubmed）也可在上述网址以及CPAN（http://www.cpan.org/）上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc70/1976427/5f5c3709ad60/1471-2105-8-280-1.jpg

相似文献

PepBank--a database of peptides based on sequence text mining and public peptide data sources.肽库（PepBank）——一个基于序列文本挖掘和公共肽数据源的肽数据库。

BMC Bioinformatics. 2007 Aug 1;8:280. doi: 10.1186/1471-2105-8-280.

ASPD (Artificially Selected Proteins/Peptides Database): a database of proteins and peptides evolved in vitro.人工选择蛋白质/肽数据库（ASPD）：一个体外进化的蛋白质和肽的数据库。

Nucleic Acids Res. 2002 Jan 1;30(1):200-2. doi: 10.1093/nar/30.1.200.

MannDB - a microbial database of automated protein sequence analyses and evidence integration for protein characterization.MannDB - 一个用于蛋白质表征的自动蛋白质序列分析和证据整合的微生物数据库。

BMC Bioinformatics. 2006 Oct 17;7:459. doi: 10.1186/1471-2105-7-459.

The Protein Identifier Cross-Referencing (PICR) service: reconciling protein identifiers across multiple source databases.蛋白质标识符交叉引用（PICR）服务：协调多个源数据库中的蛋白质标识符。

BMC Bioinformatics. 2007 Oct 18;8:401. doi: 10.1186/1471-2105-8-401.

EPIMHC: a curated database of MHC-binding peptides for customized computational vaccinology.EPIMHC：一个用于定制化计算疫苗学的MHC结合肽的精选数据库。

Bioinformatics. 2005 May 1;21(9):2140-1. doi: 10.1093/bioinformatics/bti269. Epub 2005 Jan 18.

MHCBN: a comprehensive database of MHC binding and non-binding peptides.MHCBN：一个关于主要组织相容性复合体（MHC）结合和非结合肽段的综合数据库。

Bioinformatics. 2003 Mar 22;19(5):665-6. doi: 10.1093/bioinformatics/btg055.

PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results.PLAN：一个用于自动化高通量BLAST搜索以及管理和挖掘结果的网络平台。

BMC Bioinformatics. 2007 Feb 9;8:53. doi: 10.1186/1471-2105-8-53.

Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens.通过自然语言处理对PubMed摘要进行文本挖掘，以创建关于细菌性肠道病原体分子机制的公共知识库。

BMC Bioinformatics. 2009 Jun 10;10:177. doi: 10.1186/1471-2105-10-177.

Data mining of sequences and 3D structures of allergenic proteins.变应原蛋白的序列和三维结构的数据挖掘

Bioinformatics. 2002 Oct;18(10):1358-64. doi: 10.1093/bioinformatics/18.10.1358.

UniProt archive.通用蛋白质数据库存档。

Bioinformatics. 2004 Nov 22;20(17):3236-7. doi: 10.1093/bioinformatics/bth191. Epub 2004 Mar 25.

引用本文的文献

Molecular Modelling in Bioactive Peptide Discovery and Characterisation.生物活性肽发现与表征中的分子建模

Biomolecules. 2025 Apr 3;15(4):524. doi: 10.3390/biom15040524.

Overview and limitations of database in global traditional medicines: A narrative review.全球传统医学数据库概述与局限性：一篇叙述性综述

Acta Pharmacol Sin. 2025 Feb;46(2):235-263. doi: 10.1038/s41401-024-01353-1. Epub 2024 Aug 2.

From Organic Fragments to Photoswitchable Catalysts: The OFF-ON Structural Repository for Transferable Kernel-Based Potentials.从有机片段到光致变色催化剂：可转移基于核的势能的 OFF-ON 结构库。

J Chem Inf Model. 2024 Feb 26;64(4):1201-1212. doi: 10.1021/acs.jcim.3c01953. Epub 2024 Feb 6.

Identification of Potential Bioactive Peptides in Sheep Milk Kefir through Peptidomic Analysis at Different Fermentation Times.通过对不同发酵时间的羊乳开菲尔进行肽组学分析鉴定潜在生物活性肽

Foods. 2023 Aug 7;12(15):2974. doi: 10.3390/foods12152974.

Enzymatic synthesis of new antimicrobial peptides for food purposes.用于食品目的的新型抗菌肽的酶促合成。

Front Microbiol. 2023 May 16;14:1153135. doi: 10.3389/fmicb.2023.1153135. eCollection 2023.

In Silico Prospecting for Novel Bioactive Peptides from Seafoods: A Case Study on Pacific Oyster ().基于海鲜的新型生物活性肽的计算机筛选：以太平洋牡蛎（）为例。

Molecules. 2023 Jan 9;28(2):651. doi: 10.3390/molecules28020651.

Peptide Utility (PU) search server: A new tool for peptide sequence search from multiple databases.肽效用（PU）搜索服务器：一种用于从多个数据库搜索肽序列的新工具。

Heliyon. 2022 Dec 10;8(12):e12283. doi: 10.1016/j.heliyon.2022.e12283. eCollection 2022 Dec.

MPMABP: A CNN and Bi-LSTM-Based Method for Predicting Multi-Activities of Bioactive Peptides.MPMABP：一种基于卷积神经网络和双向长短期记忆网络的生物活性肽多活性预测方法。

Pharmaceuticals (Basel). 2022 Jun 3;15(6):707. doi: 10.3390/ph15060707.

Antioxidant and Antimicrobial Peptides Derived from Food Proteins.抗氧化肽和抗菌肽源于食物蛋白。

Molecules. 2022 Feb 16;27(4):1343. doi: 10.3390/molecules27041343.

Can the SARS-CoV-2 Spike Protein Bind Integrins Independent of the RGD Sequence?SARS-CoV-2 刺突蛋白能否不依赖 RGD 序列结合整合素？

Front Cell Infect Microbiol. 2021 Nov 18;11:765300. doi: 10.3389/fcimb.2021.765300. eCollection 2021.

本文引用的文献

Indexed Pain Journals.索引疼痛期刊。

J Pain Palliat Care Pharmacother. 2008;22(1):45-46. doi: 10.1080/15360280801989377.

In vivo imaging of molecularly targeted phage.分子靶向噬菌体的体内成像

Neoplasia. 2006 Dec;8(12):1011-8. doi: 10.1593/neo.06610.

Database resources of the National Center for Biotechnology Information.美国国立生物技术信息中心的数据库资源。

Nucleic Acids Res. 2007 Jan;35(Database issue):D5-12. doi: 10.1093/nar/gkl1031. Epub 2006 Dec 14.

Entrez Gene: gene-centered information at NCBI.Entrez基因：美国国立医学图书馆国家生物技术信息中心的基因中心信息。

Nucleic Acids Res. 2007 Jan;35(Database issue):D26-31. doi: 10.1093/nar/gkl993. Epub 2006 Dec 5.

IntAct--open source resource for molecular interaction data.IntAct——分子相互作用数据的开源资源。

Nucleic Acids Res. 2007 Jan;35(Database issue):D561-5. doi: 10.1093/nar/gkl958. Epub 2006 Dec 1.

MINT: the Molecular INTeraction database.MINT：分子相互作用数据库。

Nucleic Acids Res. 2007 Jan;35(Database issue):D572-4. doi: 10.1093/nar/gkl950. Epub 2006 Nov 29.

Selection by phage display of peptides targeting the HIV-1 TAR element.通过噬菌体展示筛选靶向HIV-1 TAR元件的肽段。

RNA Biol. 2005 Jan;2(1):28-33. doi: 10.4161/rna.2.1.1681. Epub 2005 Jan 25.

Defensins knowledgebase: a manually curated database and information source focused on the defensins family of antimicrobial peptides.防御素知识库：一个人工整理的数据库和信息源，专注于抗菌肽的防御素家族。

Nucleic Acids Res. 2007 Jan;35(Database issue):D265-8. doi: 10.1093/nar/gkl866. Epub 2006 Nov 7.

Small molecules, big players: the National Cancer Institute's Initiative for Chemical Genetics.小分子，大作用：美国国立癌症研究所的化学遗传学计划

Cancer Res. 2006 Sep 15;66(18):8935-42. doi: 10.1158/0008-5472.CAN-06-2552.

ADAM: another database of abbreviations in MEDLINE.ADAM：医学在线数据库（MEDLINE）中的另一个缩写词数据库。

Bioinformatics. 2006 Nov 15;22(22):2813-8. doi: 10.1093/bioinformatics/btl480. Epub 2006 Sep 18.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

肽库（PepBank）——一个基于序列文本挖掘和公共肽数据源的肽数据库。

PepBank--a database of peptides based on sequence text mining and public peptide data sources.

作者信息

机构信息

出版信息

BACKGROUND

DESCRIPTION

CONCLUSION

背景

描述

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献