Department of Life Science Informatics, Rheinische Friedrich-Wilhelms-Universität, D-53113 Bonn, Germany.
Drug Discov Today. 2010 Aug;15(15-16):630-9. doi: 10.1016/j.drudis.2010.06.004. Epub 2010 Jun 12.
Computational data mining and visualization techniques play a central part in the extraction of structure-activity relationship (SAR) information from compound sets including high-throughput screening data. Standard statistical and classification techniques can be used to organize data sets and evaluate the chemical neighborhood of potent hits; however, such methods are limited in their ability to extract complex SAR patterns from data sets and make them readily accessible to medicinal chemists. Therefore, new approaches and data structures are being developed that explicitly focus on molecular structure and its relationship to biological activity across multiple targets. Here, we review standard techniques for compound data analysis and describe new data structures and computational tools for SAR mining of large compound data sets.
计算数据挖掘和可视化技术在从包括高通量筛选数据在内的化合物集中提取结构-活性关系 (SAR) 信息方面起着核心作用。标准的统计和分类技术可用于组织数据集并评估有效命中的化学邻域; 然而,这些方法在从数据集中提取复杂的 SAR 模式并使其易于为药物化学家所用方面能力有限。因此,正在开发新的方法和数据结构,这些方法和数据结构明确侧重于分子结构及其与多个靶标生物活性的关系。在这里,我们回顾了化合物数据分析的标准技术,并描述了用于大规模化合物数据集 SAR 挖掘的新数据结构和计算工具。