Department of Biology at the University of Waterloo, Waterloo, Ontario, Canada.
BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-11-S8-S4.
This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology) to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work.
Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR) problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology.
Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems.
We conclude that our success is due to the ability of our system to: 1) encode molecules losslessly before presentation to the learning system, and 2) leverage the design of molecular description languages to facilitate the identification of relevant structural attributes of the molecules over different problem domains.
本文展示了神经网络语法如何学习对化学和毒理学中的各种任务进行分类和评分。除了对之前研究过的数据集进行更详细的分析之外,我们还引入了三个新的数据集(BBB、FXa 和毒理学),以展示该方法的通用性。本文开发并应用了一种新的实验方法,既适用于新数据集,也适用于之前研究过的数据集。该方法严谨且具有统计学基础,最终通过威布尔显著性检验证明了该系统的有效性。我们进一步将该特定技术完整推广到任意语法和数据集,使用数学抽象允许不同领域的研究人员将该方法应用于自己的工作。
我们的工作可以被视为解决定量构效关系(QSAR)问题的现有方法的替代方案。为此,我们从方法论和性能两个角度审查了许多方法。除了这些方法,我们还研究了许多可以被通用分类器系统使用的化学性质,例如前馈人工神经网络。在研究这些方法时,我们确定了一组有趣的基准问题集,其中许多上述方法都已应用于这些问题集。这些包括:ACE、AChE、AR、BBB、BZR、Cox2、DHFR、ER、FXa、GPB、Therm 和 Thr。最后,我们通过收集毒理学数据来开发自己的基准数据集。
我们的结果表明,我们的系统在广泛的问题类型上的表现优于或与现有方法相当。我们的方法不需要应用其他方法到新问题所需的专家知识。
我们得出结论,我们的成功归因于我们的系统能够:1)在将分子呈现给学习系统之前无损地对其进行编码,2)利用分子描述语言的设计来促进在不同问题领域中识别分子的相关结构属性。