Suppr超能文献

压缩特征矩阵——一种基于特征的子结构搜索的快速方法。

The compressed feature matrix--a fast method for feature based substructure search.

作者信息

Abolmaali S F Badreddin, Wegner Jörg K, Zell Andreas

机构信息

Department of Computer Science, University of Tuebingen, Sand 1, 72076 Tübingen, Germany.

出版信息

J Mol Model. 2003 Aug;9(4):235-41. doi: 10.1007/s00894-003-0126-0. Epub 2003 Apr 26.

Abstract

The compressed feature matrix (CFM) is a feature based molecular descriptor for the fast processing of pharmacochemical applications such as adaptive similarity search, pharmacophore development and substructure search. Depending on the particular purpose, the descriptor may be generated upon either topological or Euclidean molecular data. To assure a variable utilizability, the assignment of the structural patterns to feature types is arbitrarily determined by the user. This step is based on a graph algorithm for substructure search, which resembles the common substructure descriptors. While these merely allow a screening for the predefined patterns, the CFM permits a real substructure/subgraph search, presuming that all desired elements of the query substructure are described by the selected feature set. In this work, the CFM based substructure search is evaluated with regard to both the different outputs resulting from varying feature sets and the search speed. As a benchmark we use the programmable atom typer (PATTY) graph algorithm. When comparing the two methods, the CFM based matrix algorithm is up to several hundred times faster than PATTY and when using the CFM as a basis for substructure screening, the search speed is accelerated by three orders of magnitude. Thus, the CFM based substructure search complies with the requirements for interactive usage, even for the evaluation of several hundred thousand compounds. The concept of the CFM is implemented in the software COFEA. FIGURE CFM based substructure search using the compounds dopamine and benzene-1,2-diol

摘要

压缩特征矩阵(CFM)是一种基于特征的分子描述符,用于快速处理药物化学应用,如自适应相似性搜索、药效团开发和子结构搜索。根据特定目的,该描述符可以基于拓扑或欧几里得分子数据生成。为确保可变的实用性,用户可任意确定将结构模式分配给特征类型。此步骤基于一种用于子结构搜索的图算法,该算法类似于常见的子结构描述符。虽然这些描述符仅允许筛选预定义模式,但CFM允许进行真正的子结构/子图搜索,前提是查询子结构的所有所需元素都由所选特征集描述。在这项工作中,对基于CFM的子结构搜索在不同特征集产生的不同输出以及搜索速度方面进行了评估。作为基准,我们使用可编程原子类型器(PATTY)图算法。比较这两种方法时,基于CFM的矩阵算法比PATTY快数百倍,并且当以CFM为基础进行子结构筛选时,搜索速度加快三个数量级。因此,基于CFM的子结构搜索符合交互式使用的要求,即使对于评估数十万种化合物也是如此。CFM的概念在软件COFEA中实现。图 使用化合物多巴胺和苯 - 1,2 - 二醇进行基于CFM的子结构搜索

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验