Suppr超能文献

用 NGN 对分子进行分类和评分:新数据集、显著性检验和泛化。

Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization.

机构信息

Department of Biology at the University of Waterloo, Waterloo, Ontario, Canada.

出版信息

BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-11-S8-S4.

Abstract

UNLABELLED

This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology) to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work.

BACKGROUND

Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR) problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology.

RESULTS

Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems.

CONCLUSIONS

We conclude that our success is due to the ability of our system to: 1) encode molecules losslessly before presentation to the learning system, and 2) leverage the design of molecular description languages to facilitate the identification of relevant structural attributes of the molecules over different problem domains.

摘要

未加标签

本文展示了神经网络语法如何学习对化学和毒理学中的各种任务进行分类和评分。除了对之前研究过的数据集进行更详细的分析之外,我们还引入了三个新的数据集(BBB、FXa 和毒理学),以展示该方法的通用性。本文开发并应用了一种新的实验方法,既适用于新数据集,也适用于之前研究过的数据集。该方法严谨且具有统计学基础,最终通过威布尔显著性检验证明了该系统的有效性。我们进一步将该特定技术完整推广到任意语法和数据集,使用数学抽象允许不同领域的研究人员将该方法应用于自己的工作。

背景

我们的工作可以被视为解决定量构效关系(QSAR)问题的现有方法的替代方案。为此,我们从方法论和性能两个角度审查了许多方法。除了这些方法,我们还研究了许多可以被通用分类器系统使用的化学性质,例如前馈人工神经网络。在研究这些方法时,我们确定了一组有趣的基准问题集,其中许多上述方法都已应用于这些问题集。这些包括:ACE、AChE、AR、BBB、BZR、Cox2、DHFR、ER、FXa、GPB、Therm 和 Thr。最后,我们通过收集毒理学数据来开发自己的基准数据集。

结果

我们的结果表明,我们的系统在广泛的问题类型上的表现优于或与现有方法相当。我们的方法不需要应用其他方法到新问题所需的专家知识。

结论

我们得出结论,我们的成功归因于我们的系统能够:1)在将分子呈现给学习系统之前无损地对其进行编码,2)利用分子描述语言的设计来促进在不同问题领域中识别分子的相关结构属性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dfeb/2966291/7fd7c3977fa5/1471-2105-11-S8-S4-1.jpg

相似文献

1
Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization.
BMC Bioinformatics. 2010 Oct 26;11 Suppl 8(Suppl 8):S4. doi: 10.1186/1471-2105-11-S8-S4.
2
On multilabel classification methods of incompletely labeled biomedical text data.
Comput Math Methods Med. 2014;2014:781807. doi: 10.1155/2014/781807. Epub 2014 Jan 23.
3
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
4
Protein fold recognition using the gradient boost algorithm.
Comput Syst Bioinformatics Conf. 2006:43-53.
5
Artificial intelligence approaches for rational drug design and discovery.
Curr Pharm Des. 2007;13(14):1497-508. doi: 10.2174/138161207780765954.
6
Evolutionary optimization of a hierarchical object recognition model.
IEEE Trans Syst Man Cybern B Cybern. 2005 Jun;35(3):426-37. doi: 10.1109/tsmcb.2005.846649.
7
A fast algorithm for learning a ranking function from large-scale data sets.
IEEE Trans Pattern Anal Mach Intell. 2008 Jul;30(7):1158-70. doi: 10.1109/TPAMI.2007.70776.
8
FuzzyART neural network for protein classification.
J Bioinform Comput Biol. 2010 Oct;8(5):825-41. doi: 10.1142/s0219720010004951.
9
Prediction of protein structural class with Rough Sets.
BMC Bioinformatics. 2006 Jan 14;7:20. doi: 10.1186/1471-2105-7-20.
10
Scalable biomedical Named Entity Recognition: investigation of a database-supported SVM approach.
Int J Bioinform Res Appl. 2010;6(2):191-208. doi: 10.1504/IJBRA.2010.032121.

本文引用的文献

1
Molecule kernels: a descriptor- and alignment-free quantitative structure-activity relationship approach.
J Chem Inf Model. 2008 Sep;48(9):1868-81. doi: 10.1021/ci800144y. Epub 2008 Sep 4.
2
Classification of small molecules by two- and three-dimensional decomposition kernels.
Bioinformatics. 2007 Aug 15;23(16):2038-45. doi: 10.1093/bioinformatics/btm298. Epub 2007 Jun 5.
3
The Blue Obelisk-interoperability in chemical informatics.
J Chem Inf Model. 2006 May-Jun;46(3):991-8. doi: 10.1021/ci050400b.
5
Graph kernels for chemical informatics.
Neural Netw. 2005 Oct;18(8):1093-110. doi: 10.1016/j.neunet.2005.07.009. Epub 2005 Sep 12.
6
7
A comparison of methods for modeling quantitative structure-activity relationships.
J Med Chem. 2004 Oct 21;47(22):5541-54. doi: 10.1021/jm0497141.
10
Study of 202 natural, synthetic, and environmental chemicals for binding to the androgen receptor.
Chem Res Toxicol. 2003 Oct;16(10):1338-58. doi: 10.1021/tx030011g.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验