用 NGN 对分子进行分类和评分：新数据集、显著性检验和泛化。

Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization.

机构信息

Department of Biology at the University of Waterloo, Waterloo, Ontario, Canada.

出版信息

BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-11-S8-S4.

DOI:10.1186/1471-2105-11-S8-S4

PMID:21034429

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2966291/

Abstract

UNLABELLED

This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology) to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work.

BACKGROUND

Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR) problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology.

RESULTS

Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems.

CONCLUSIONS

We conclude that our success is due to the ability of our system to: 1) encode molecules losslessly before presentation to the learning system, and 2) leverage the design of molecular description languages to facilitate the identification of relevant structural attributes of the molecules over different problem domains.

摘要

未加标签

本文展示了神经网络语法如何学习对化学和毒理学中的各种任务进行分类和评分。除了对之前研究过的数据集进行更详细的分析之外，我们还引入了三个新的数据集（BBB、FXa 和毒理学），以展示该方法的通用性。本文开发并应用了一种新的实验方法，既适用于新数据集，也适用于之前研究过的数据集。该方法严谨且具有统计学基础，最终通过威布尔显著性检验证明了该系统的有效性。我们进一步将该特定技术完整推广到任意语法和数据集，使用数学抽象允许不同领域的研究人员将该方法应用于自己的工作。

背景

我们的工作可以被视为解决定量构效关系（QSAR）问题的现有方法的替代方案。为此，我们从方法论和性能两个角度审查了许多方法。除了这些方法，我们还研究了许多可以被通用分类器系统使用的化学性质，例如前馈人工神经网络。在研究这些方法时，我们确定了一组有趣的基准问题集，其中许多上述方法都已应用于这些问题集。这些包括：ACE、AChE、AR、BBB、BZR、Cox2、DHFR、ER、FXa、GPB、Therm 和 Thr。最后，我们通过收集毒理学数据来开发自己的基准数据集。