Sitzmann M, Filippov I V, Nicklaus M C
Computer-Aided Drug Design Group, Laboratory of Medicinal Chemistry, Center for Cancer Research, National Cancer Institute, National Institutes of Health, DHHS, Frederick, MD, USA.
SAR QSAR Environ Res. 2008 Jan-Mar;19(1-2):1-9. doi: 10.1080/10629360701843540.
()New data, tools and services recently made available on the web server (http://cactus.nci.nih.gov) of the Computer-Aided Drug Design (CADD) Group, NCI, NIH, developed in the context of chemoinformatics and drug development work, are presented. These tools are designed for searching for structures in very large databases of small molecules. One of them is a web service-the Chemical Structure Look-up Service (CSLS)-for very rapid structure look-up in an aggregated collection of more than 80 databases comprising more than 27 million unique structures at the time of this writing. CSLS contains pointers to the entries in toxicology-related databases, catalogues of commercially available samples, drugs, assay results data sets, and databases in several other categories. CSLS allows the user to find out very rapidly in which one(s) of all these databases a given structure occurs independent of the representation of the input structure, by making use of InChIs as well as new CACTVS hashcode-based identifiers. These latter, calculable, identifiers are designed to take into account tautomerism, different resonance structures drawn for charged species, and presence of additional fragments. They make possible fine-tunable yet rapid compound identification and database overlap analyses in very large compound collections.
介绍了最近在国立卫生研究院(NIH)国家癌症研究所(NCI)计算机辅助药物设计(CADD)小组的网络服务器(http://cactus.nci.nih.gov)上提供的新数据、工具和服务,这些是在化学信息学和药物开发工作背景下开发的。这些工具旨在在非常大的小分子数据库中搜索结构。其中之一是一种网络服务——化学结构查找服务(CSLS),用于在一个聚合的80多个数据库集合中进行非常快速的结构查找,在撰写本文时,该集合包含超过2700万个独特结构。CSLS包含指向毒理学相关数据库、市售样品目录、药物、测定结果数据集以及其他几类数据库中条目的指针。CSLS允许用户通过使用国际化学标识符(InChIs)以及基于新的计算机辅助结构验证工具包(CACTVS)哈希码的标识符,非常快速地找出给定结构出现在所有这些数据库中的哪一个(些)中,而与输入结构的表示无关。这些可计算的标识符旨在考虑互变异构、为带电物种绘制的不同共振结构以及其他片段的存在。它们使得在非常大的化合物集合中进行精细可调但快速的化合物鉴定和数据库重叠分析成为可能。