Suppr超能文献

定量化学生物组学:蛋白质-配体相互作用的机器学习模型。

Quantitative chemogenomics: machine-learning models of protein-ligand interaction.

机构信息

Department of Medical Sciences, Uppsala University, Academic Hospital, SE-751 85, Uppsala, Sweden.

出版信息

Curr Top Med Chem. 2011;11(15):1978-93. doi: 10.2174/156802611796391249.

Abstract

Chemogenomics is an emerging interdisciplinary field that lies in the interface of biology, chemistry, and informatics. Most of the currently used drugs are small molecules that interact with proteins. Understanding protein-ligand interaction is therefore central to drug discovery and design. In the subfield of chemogenomics known as proteochemometrics, protein-ligand-interaction models are induced from data matrices that consist of both protein and ligand information along with some experimentally measured variable. The two general aims of this quantitative multi-structure-property-relationship modeling (QMSPR) approach are to exploit sparse/incomplete information sources and to obtain more general models covering larger parts of the protein-ligand space, than traditional approaches that focuses mainly on specific targets or ligands. The data matrices, usually obtained from multiple sparse/incomplete sources, typically contain series of proteins and ligands together with quantitative information about their interactions. A useful model should ideally be easy to interpret and generalize well to new unseen protein-ligand combinations. Resolving this requires sophisticated machine-learning methods for model induction, combined with adequate validation. This review is intended to provide a guide to methods and data sources suitable for this kind of protein-ligand-interaction modeling. An overview of the modeling process is presented including data collection, protein and ligand descriptor computation, data preprocessing, machine-learning-model induction and validation. Concerns and issues specific for each step in this kind of data-driven modeling will be discussed.

摘要

化学生物组学是一个新兴的跨学科领域,位于生物学、化学和信息学的交叉点。目前使用的大多数药物都是小分子,它们与蛋白质相互作用。因此,理解蛋白质-配体相互作用是药物发现和设计的核心。在化学生物组学的子领域中,称为“蛋白质化学计量学”,从包含蛋白质和配体信息以及一些实验测量变量的数据矩阵中诱导出蛋白质-配体相互作用模型。这种定量多结构-性质-关系建模 (QMSPR) 方法的两个一般目标是利用稀疏/不完整的信息源,并获得更通用的模型,涵盖更大的蛋白质-配体空间,而不是主要关注特定靶标或配体的传统方法。数据矩阵通常来自多个稀疏/不完整的来源,通常包含一系列蛋白质和配体以及它们相互作用的定量信息。理想情况下,有用的模型应该易于解释并且能够很好地推广到新的未见的蛋白质-配体组合。要解决这个问题,需要用于模型诱导的复杂机器学习方法,以及适当的验证。本综述旨在为这种蛋白质-配体相互作用建模提供适合的方法和数据源指南。介绍了建模过程的概述,包括数据收集、蛋白质和配体描述符计算、数据预处理、机器学习模型诱导和验证。将讨论这种数据驱动建模中每个步骤特有的关注点和问题。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验