Department of Agricultural Biotechnology, Seoul National University, Seoul, 08826, Republic of Korea.
Bio-MAX Institute, Seoul National University, Seoul, 08826, Republic of Korea.
BMC Bioinformatics. 2022 Jun 7;23(1):218. doi: 10.1186/s12859-022-04752-5.
Due to their diverse bioactivity, natural product (NP)s have been developed as commercial products in the pharmaceutical, food and cosmetic sectors as natural compound (NC)s and in the form of extracts. Following administration, NCs typically interact with multiple target proteins to elicit their effects. Various machine learning models have been developed to predict multi-target modulating NCs with desired physiological effects. However, due to deficiencies with existing chemical-protein interaction datasets, which are mostly single-labeled and limited, the existing models struggle to predict new chemical-protein interactions. New techniques are needed to overcome these limitations.
We propose a novel NC discovery model called OptNCMiner that offers various advantages. The model is trained via end-to-end learning with a feature extraction step implemented, and it predicts multi-target modulating NCs through multi-label learning. In addition, it offers a few-shot learning approach to predict NC-protein interactions using a small training dataset. OptNCMiner achieved better prediction performance in terms of recall than conventional classification models. It was tested for the prediction of NC-protein interactions using small datasets and for a use case scenario to identify multi-target modulating NCs for type 2 diabetes mellitus complications.
OptNCMiner identifies NCs that modulate multiple target proteins, which facilitates the discovery and the understanding of biological activity of novel NCs with desirable health benefits.
由于具有多样的生物活性,天然产物(NP)已被开发为药物、食品和化妆品领域的天然化合物(NC)和提取物形式的商业产品。给药后,NC 通常与多种靶蛋白相互作用以发挥其作用。已经开发了各种机器学习模型来预测具有所需生理作用的多靶调制 NC。然而,由于现有的化学-蛋白质相互作用数据集存在缺陷,这些数据集大多是单一标记的且有限,现有的模型难以预测新的化学-蛋白质相互作用。需要新技术来克服这些限制。
我们提出了一种名为 OptNCMiner 的新型 NC 发现模型,该模型具有多种优势。该模型通过端到端学习进行训练,并通过多标签学习预测多靶调制 NC。此外,它还提供了一种使用小训练数据集进行预测的少样本学习方法。OptNCMiner 在召回率方面的预测性能优于传统分类模型。它已被用于使用小数据集预测 NC-蛋白质相互作用的测试,并用于识别 2 型糖尿病并发症的多靶调制 NC 的用例场景。
OptNCMiner 识别出调节多种靶蛋白的 NC,这有助于发现和理解具有理想健康益处的新型 NC 的生物活性。