Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Baja California, 22860, Mexico.
Universidad San Francisco de Quito, Grupo de Medicina Molecular y Traslacional (MeM&T), Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17-1200-841, Quito, Ecuador.
Sci Rep. 2020 Oct 22;10(1):18074. doi: 10.1038/s41598-020-75029-1.
The increasing interest in bioactive peptides with therapeutic potentials has been reflected in a large variety of biological databases published over the last years. However, the knowledge discovery process from these heterogeneous data sources is a nontrivial task, becoming the essence of our research endeavor. Therefore, we devise a unified data model based on molecular similarity networks for representing a chemical reference space of bioactive peptides, having an implicit knowledge that is currently not explicitly accessed in existing biological databases. Indeed, our main contribution is a novel workflow for the automatic construction of such similarity networks, enabling visual graph mining techniques to uncover new insights from the "ocean" of known bioactive peptides. The workflow presented here relies on the following sequential steps: (i) calculation of molecular descriptors by applying statistical and aggregation operators on amino acid property vectors; (ii) a two-stage unsupervised feature selection method to identify an optimized subset of descriptors using the concepts of entropy and mutual information; (iii) generation of sparse networks where nodes represent bioactive peptides, and edges between two nodes denote their pairwise similarity/distance relationships in the defined descriptor space; and (iv) exploratory analysis using visual inspection in combination with clustering and network science techniques. For practical purposes, the proposed workflow has been implemented in our visual analytics software tool ( http://mobiosd-hub.com/starpep/ ), to assist researchers in extracting useful information from an integrated collection of 45120 bioactive peptides, which is one of the largest and most diverse data in its field. Finally, we illustrate the applicability of the proposed workflow for discovering central nodes in molecular similarity networks that may represent a biologically relevant chemical space known to date.
近年来,具有治疗潜力的生物活性肽引起了越来越多的关注,这反映在过去几年发布的各种生物数据库中。然而,从这些异构数据源中发现知识是一项艰巨的任务,这也是我们研究工作的核心。因此,我们设计了一个基于分子相似性网络的统一数据模型,用于表示生物活性肽的化学参考空间,其中隐含着目前在现有生物数据库中尚未明确访问的知识。实际上,我们的主要贡献是一种新颖的自动构建此类相似性网络的工作流程,使可视化图挖掘技术能够从已知生物活性肽的“海洋”中发现新的见解。这里呈现的工作流程依赖于以下顺序步骤:(i)通过在氨基酸属性向量上应用统计和聚合运算符来计算分子描述符;(ii)使用熵和互信息的概念进行两阶段无监督特征选择方法,以识别优化的描述符子集;(iii)生成稀疏网络,其中节点表示生物活性肽,并且两个节点之间的边表示它们在定义的描述符空间中的成对相似/距离关系;以及(iv)使用可视化检查结合聚类和网络科学技术进行探索性分析。出于实际目的,该工作流程已在我们的可视化分析软件工具(http://mobiosd-hub.com/starpep/)中实现,以帮助研究人员从一个集成的 45120 种生物活性肽的集合中提取有用信息,这是该领域最大和最多样化的数据之一。最后,我们说明了所提出的工作流程在发现分子相似性网络中可能代表迄今已知的生物学相关化学空间的中心节点的应用。
BMC Bioinformatics. 2018-8-29
BMC Bioinformatics. 2009-11-16
BMC Bioinformatics. 2018-4-20
Future Med Chem. 2025-2
NPJ Syst Biol Appl. 2024-10-4
Antibiotics (Basel). 2024-8-14
ACS Synth Biol. 2023-9-15
Antibiotics (Basel). 2023-4-13
J Med Chem. 2020-5-8
Curr Pharm Des. 2019
Sci Rep. 2019-7-12
Int J Mol Sci. 2019-5-14
Curr Opin Microbiol. 2019-5-11
PLoS Comput Biol. 2019-2-8