Viesi Eva, Perricone Ugo, Aloy Patrick, Giugno Rosalba
Department of Computer Science, University of Verona, Verona, Italy.
Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Catalonia, Spain.
J Cheminform. 2025 Jan 31;17(1):13. doi: 10.1186/s13321-025-00961-1.
More sophisticated representations of compounds attempt to incorporate not only information on the structure and physicochemical properties of molecules, but also knowledge about their biological traits, leading to the so-called bioactivity profile. The bioactive profiling of air pollutants is challenging and crucial, as their biological activity and toxicological effects have not been deeply investigated yet, and further exploration could shed light on the impact of air pollution on complex disorders. Therefore, a biological signature that simultaneously captures the chemistry and the biology of small molecules may be beneficial in predicting the behaviour of such ligands towards a protein target. Moreover, the interactivity between biological entities can be represented through combined feature vectors that can be given as input to a machine learning (ML) model to capture the underlying interaction. To this end, we propose a chemogenomic approach, called Air Pollutant Bioactivity (APBIO), which integrates compound bioactivity signatures and target sequence descriptors to train ML classifiers subsequently used to predict potential compound-target interactions (CTIs). We report the performances of the proposed methodology and, via external validation sets, demonstrate its outperformance compared to existing molecular representations in terms of model generalizability. We have also developed a publicly available Streamlit application for APBIO at ap-bio.streamlit.app, allowing users to predict associations between investigated compounds and protein targets.Scientific contributionWe derived ex novo bioactivity signatures for air pollutant molecules to capture their biological behaviour and associations with protein targets. The proposed chemogenomic methodology enables the prediction of novel CTIs for known or similar compounds and targets through well-established and efficient ML models, deepening our insight into the molecular interactions and mechanisms that may have a deleterious impact on human biological systems.
对化合物更复杂的表示方法不仅试图纳入有关分子结构和物理化学性质的信息,还试图纳入有关其生物学特性的知识,从而形成所谓的生物活性概况。空气污染物的生物活性分析具有挑战性且至关重要,因为它们的生物活性和毒理学效应尚未得到深入研究,进一步探索可能会揭示空气污染对复杂疾病的影响。因此,一个能同时捕捉小分子化学和生物学特性的生物特征,可能有助于预测此类配体与蛋白质靶点的相互作用行为。此外,生物实体之间的相互作用可以通过组合特征向量来表示,这些特征向量可以作为输入提供给机器学习(ML)模型,以捕捉潜在的相互作用。为此,我们提出了一种化学基因组学方法,称为空气污染物生物活性(APBIO),该方法整合了化合物生物活性特征和靶点序列描述符,以训练ML分类器,随后用于预测潜在的化合物-靶点相互作用(CTIs)。我们报告了所提出方法的性能,并通过外部验证集,证明了其在模型通用性方面优于现有的分子表示方法。我们还在ap-bio.streamlit.app上为APBIO开发了一个公开可用的Streamlit应用程序,允许用户预测所研究化合物与蛋白质靶点之间的关联。
科学贡献
我们从头推导了空气污染物分子的生物活性特征,以捕捉它们的生物学行为以及与蛋白质靶点的关联。所提出的化学基因组学方法能够通过成熟且高效的ML模型预测已知或相似化合物和靶点的新型CTIs,加深我们对可能对人类生物系统产生有害影响的分子相互作用和机制的理解。