自动构建分子相似性网络，用于生物活性肽化学空间中的可视化图挖掘：一种无监督学习方法。

Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach.

机构信息

Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Baja California, 22860, Mexico.

Universidad San Francisco de Quito, Grupo de Medicina Molecular y Traslacional (MeM&T), Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17-1200-841, Quito, Ecuador.

出版信息

Sci Rep. 2020 Oct 22;10(1):18074. doi: 10.1038/s41598-020-75029-1.

DOI:10.1038/s41598-020-75029-1

PMID:33093586

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7583304/

Abstract

The increasing interest in bioactive peptides with therapeutic potentials has been reflected in a large variety of biological databases published over the last years. However, the knowledge discovery process from these heterogeneous data sources is a nontrivial task, becoming the essence of our research endeavor. Therefore, we devise a unified data model based on molecular similarity networks for representing a chemical reference space of bioactive peptides, having an implicit knowledge that is currently not explicitly accessed in existing biological databases. Indeed, our main contribution is a novel workflow for the automatic construction of such similarity networks, enabling visual graph mining techniques to uncover new insights from the "ocean" of known bioactive peptides. The workflow presented here relies on the following sequential steps: (i) calculation of molecular descriptors by applying statistical and aggregation operators on amino acid property vectors; (ii) a two-stage unsupervised feature selection method to identify an optimized subset of descriptors using the concepts of entropy and mutual information; (iii) generation of sparse networks where nodes represent bioactive peptides, and edges between two nodes denote their pairwise similarity/distance relationships in the defined descriptor space; and (iv) exploratory analysis using visual inspection in combination with clustering and network science techniques. For practical purposes, the proposed workflow has been implemented in our visual analytics software tool ( http://mobiosd-hub.com/starpep/ ), to assist researchers in extracting useful information from an integrated collection of 45120 bioactive peptides, which is one of the largest and most diverse data in its field. Finally, we illustrate the applicability of the proposed workflow for discovering central nodes in molecular similarity networks that may represent a biologically relevant chemical space known to date.

摘要

近年来，具有治疗潜力的生物活性肽引起了越来越多的关注，这反映在过去几年发布的各种生物数据库中。然而，从这些异构数据源中发现知识是一项艰巨的任务，这也是我们研究工作的核心。因此，我们设计了一个基于分子相似性网络的统一数据模型，用于表示生物活性肽的化学参考空间，其中隐含着目前在现有生物数据库中尚未明确访问的知识。实际上，我们的主要贡献是一种新颖的自动构建此类相似性网络的工作流程，使可视化图挖掘技术能够从已知生物活性肽的“海洋”中发现新的见解。这里呈现的工作流程依赖于以下顺序步骤：（i）通过在氨基酸属性向量上应用统计和聚合运算符来计算分子描述符；（ii）使用熵和互信息的概念进行两阶段无监督特征选择方法，以识别优化的描述符子集；（iii）生成稀疏网络，其中节点表示生物活性肽，并且两个节点之间的边表示它们在定义的描述符空间中的成对相似/距离关系；以及（iv）使用可视化检查结合聚类和网络科学技术进行探索性分析。出于实际目的，该工作流程已在我们的可视化分析软件工具（http://mobiosd-hub.com/starpep/）中实现，以帮助研究人员从一个集成的 45120 种生物活性肽的集合中提取有用信息，这是该领域最大和最多样化的数据之一。最后，我们说明了所提出的工作流程在发现分子相似性网络中可能代表迄今已知的生物学相关化学空间的中心节点的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3c31/7583304/bc854f1cba7f/41598_2020_75029_Fig1_HTML.jpg

相似文献

Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach.

Sci Rep. 2020 Oct 22;10(1):18074. doi: 10.1038/s41598-020-75029-1.

Graph-based data integration from bioactive peptide databases of pharmaceutical interest: toward an organized collection enabling visual network analysis.

Bioinformatics. 2019 Nov 1;35(22):4739-4747. doi: 10.1093/bioinformatics/btz260.

Network Science and Group Fusion Similarity-Based Searching to Explore the Chemical Space of Antiparasitic Peptides.

ACS Omega. 2022 Dec 6;7(50):46012-46036. doi: 10.1021/acsomega.2c03398. eCollection 2022 Dec 20.

StarPep Toolbox: an open-source software to assist chemical space analysis of bioactive peptides and their functions using complex networks.

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad506.

Unsupervised construction of computational graphs for gene expression data with explicit structural inductive biases.

Bioinformatics. 2022 Feb 7;38(5):1320-1327. doi: 10.1093/bioinformatics/btab830.

TargetHunter: an in silico target identification tool for predicting therapeutic potential of small organic molecules based on chemogenomic database.

AAPS J. 2013 Apr;15(2):395-406. doi: 10.1208/s12248-012-9449-z. Epub 2013 Jan 5.

Clustering approaches for visual knowledge exploration in molecular interaction networks.

BMC Bioinformatics. 2018 Aug 29;19(1):308. doi: 10.1186/s12859-018-2314-z.

Algorithms for effective querying of compound graph-based pathway databases.

BMC Bioinformatics. 2009 Nov 16;10:376. doi: 10.1186/1471-2105-10-376.

Text mining-based word representations for biomedical data analysis and protein-protein interaction networks in machine learning tasks.

PLoS One. 2021 Oct 15;16(10):e0258623. doi: 10.1371/journal.pone.0258623. eCollection 2021.

SpirPep: an in silico digestion-based platform to assist bioactive peptides discovery from a genome-wide database.

BMC Bioinformatics. 2018 Apr 20;19(1):149. doi: 10.1186/s12859-018-2143-0.

引用本文的文献

Optimal Descriptor Subset Search via Chemical Information and Target Activity-Guided Algorithm for Antimicrobial Peptide Prediction.

J Chem Inf Model. 2025 Jul 14;65(13):6621-6631. doi: 10.1021/acs.jcim.5c00600. Epub 2025 Jun 18.

Unlocking Antimicrobial Peptides: In Silico Proteolysis and Artificial Intelligence-Driven Discovery from Cnidarian Omics.

Molecules. 2025 Jan 25;30(3):550. doi: 10.3390/molecules30030550.

Advances of deep Neural Networks (DNNs) in the development of peptide drugs.

Future Med Chem. 2025 Feb;17(4):485-499. doi: 10.1080/17568919.2025.2463319. Epub 2025 Feb 12.

Unveiling Encrypted Antimicrobial Peptides from Cephalopods' Salivary Glands: A Proteolysis-Driven Virtual Approach.

ACS Omega. 2024 Oct 14;9(43):43353-43367. doi: 10.1021/acsomega.4c01959. eCollection 2024 Oct 29.

Peptide hemolytic activity analysis using visual data mining of similarity-based complex networks.

NPJ Syst Biol Appl. 2024 Oct 4;10(1):115. doi: 10.1038/s41540-024-00429-2.

Innovative Alignment-Based Method for Antiviral Peptide Prediction.

Antibiotics (Basel). 2024 Aug 14;13(8):768. doi: 10.3390/antibiotics13080768.

Metabolic Connectome and Its Role in the Prediction, Diagnosis, and Treatment of Complex Diseases.

Metabolites. 2024 Jan 26;14(2):93. doi: 10.3390/metabo14020093.

Machine Learning-Enabled Genome Mining and Bioactivity Prediction of Natural Products.

ACS Synth Biol. 2023 Sep 15;12(9):2650-2662. doi: 10.1021/acssynbio.3c00234. Epub 2023 Aug 22.

StarPep Toolbox: an open-source software to assist chemical space analysis of bioactive peptides and their functions using complex networks.

Bioinformatics. 2023 Aug 1;39(8). doi: 10.1093/bioinformatics/btad506.

Complex Networks Analyses of Antibiofilm Peptides: An Emerging Tool for Next-Generation Antimicrobials' Discovery.

Antibiotics (Basel). 2023 Apr 13;12(4):747. doi: 10.3390/antibiotics12040747.

本文引用的文献

Drug Research Meets Network Science: Where Are We?

J Med Chem. 2020 Aug 27;63(16):8653-8666. doi: 10.1021/acs.jmedchem.9b01989. Epub 2020 May 8.

Computational Design of Biologically Active Anticancer Peptides and Their Interactions with Heterogeneous POPC/POPS Lipid Membranes.

J Chem Inf Model. 2020 Jan 27;60(1):332-341. doi: 10.1021/acs.jcim.9b00348. Epub 2020 Jan 14.

When global and local molecular descriptors are more than the sum of its parts: Simple, But Not Simpler?

Mol Divers. 2020 Nov;24(4):913-932. doi: 10.1007/s11030-019-10002-3. Epub 2019 Oct 28.

Recent Advances and Computational Approaches in Peptide Drug Discovery.

Curr Pharm Des. 2019;25(31):3358-3366. doi: 10.2174/1381612825666190911161106.

Centrality in Complex Networks with Overlapping Community Structure.

Sci Rep. 2019 Jul 12;9(1):10133. doi: 10.1038/s41598-019-46507-y.

ProtDCal-Suite: A web server for the numerical codification and functional analysis of proteins.

Protein Sci. 2019 Sep;28(9):1734-1743. doi: 10.1002/pro.3673.

A Comprehensive Review on Current Advances in Peptide Drug Development and Design.

Int J Mol Sci. 2019 May 14;20(10):2383. doi: 10.3390/ijms20102383.

Toward computer-made artificial antibiotics.

Curr Opin Microbiol. 2019 Oct;51:30-38. doi: 10.1016/j.mib.2019.03.004. Epub 2019 May 11.

Graph-based data integration from bioactive peptide databases of pharmaceutical interest: toward an organized collection enabling visual network analysis.

Bioinformatics. 2019 Nov 1;35(22):4739-4747. doi: 10.1093/bioinformatics/btz260.

BioJava 5: A community driven open-source bioinformatics library.

PLoS Comput Biol. 2019 Feb 8;15(2):e1006791. doi: 10.1371/journal.pcbi.1006791. eCollection 2019 Feb.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

自动构建分子相似性网络，用于生物活性肽化学空间中的可视化图挖掘：一种无监督学习方法。

Automatic construction of molecular similarity networks for visual graph mining in chemical space of bioactive peptides: an unsupervised learning approach.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献