Suppr超能文献

偏最小二乘法和 K-最近邻算法在急性毒性三维定量光谱数据-活性关系共识建模中的应用。

Partial least square and k-nearest neighbor algorithms for improved 3D quantitative spectral data-activity relationship consensus modeling of acute toxicity.

机构信息

Division of Systems Biology, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arizona.

出版信息

Environ Toxicol Chem. 2014 Jun;33(6):1271-82. doi: 10.1002/etc.2534. Epub 2014 Apr 9.

Abstract

A diverse set of 154 chemicals that included US Food and Drug Administration-regulated compounds tested for their aquatic toxicity in Daphnia magna were modeled by a 3-dimensional quantitative spectral data-activity relationship (3D-QSDAR). Two distinct algorithms, partial least squares (PLS) and Tanimoto similarity-based k-nearest neighbors (KNN), were used to process bin occupancy descriptor matrices obtained after tessellation of the 3D-QSDAR space into regularly sized bins. The performance of models utilizing bins ranging in size from 2 ppm × 2 ppm × 0.5 Å to 20 ppm × 20 ppm × 2.5 Å was explored. Rigorous quality-control criteria were imposed: 1) 100 randomized 20% hold-out test sets were generated and the average R(2) test of the respective models was used as a measure of their performance, and 2) a Y-scrambling procedure was used to identify chance correlations. A consensus between the best-performing composite PLS model using 0.5 Å × 14 ppm × 14 ppm bins and 10 latent variables (average R(2) test  = 0.770) and the best composite KNN model using 0.5 Å × 8 ppm × 8 ppm and 2 neighbors (average R(2) test  = 0.801) offered an improvement of about 7.5% (R(2) test consensus  = 0.845). Projection of the most frequently occurring bins on the standard coordinate space indicated that the presence of a primary or secondary amino group-substituted aromatic systems-would result in an increased toxic effect in Daphnia. The presence of a second aromatic ring with highly electronegative substituents 5 Å to 7 Å apart from the first ring would lead to a further increase in toxicity.

摘要

测试了包括美国食品和药物管理局监管化合物在内的 154 种不同的化学品在大型溞中的水生毒性,这些化学品通过三维定量光谱数据-活性关系(3D-QSDAR)进行建模。两种不同的算法,偏最小二乘法(PLS)和基于 Tanimoto 相似度的 K-最近邻(KNN),用于处理将 3D-QSDAR 空间划分为大小均匀的小方格后获得的箱位描述符矩阵。探索了利用从 2 ppm × 2 ppm × 0.5 Å 到 20 ppm × 20 ppm × 2.5 Å 大小的小方格的模型的性能。严格的质量控制标准被强加:1)生成了 100 个随机的 20%保留测试集,并且各自模型的平均 R(2)测试被用作其性能的度量;2)使用 Y 混淆程序来识别偶然相关性。使用 0.5 Å × 14 ppm × 14 ppm 小方格和 10 个潜在变量的最佳复合 PLS 模型(平均 R(2)测试 = 0.770)和使用 0.5 Å × 8 ppm × 8 ppm 和 2 个邻居的最佳复合 KNN 模型(平均 R(2)测试 = 0.801)之间的共识提供了约 7.5%的改进(R(2)测试共识 = 0.845)。将最常见的小方格投影到标准坐标空间上表明,存在被取代的伯或仲氨基芳香系统会导致大型溞的毒性增加。第二个芳香环存在高度电负性取代基,距离第一个环 5 到 7 Å,会导致毒性进一步增加。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验