• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用线性支持向量机和特定于问题的指标进行大规模结构-活性关系学习。

Large-scale learning of structure-activity relationships using a linear support vector machine and problem-specific metrics.

机构信息

Center for Bioinformatics (ZBIT), University of Tübingen, Tübingen, Germany.

出版信息

J Chem Inf Model. 2011 Feb 28;51(2):203-13. doi: 10.1021/ci100073w. Epub 2011 Jan 5.

DOI:10.1021/ci100073w
PMID:21207929
Abstract

The goal of this study was to adapt a recently proposed linear large-scale support vector machine to large-scale binary cheminformatics classification problems and to assess its performance on various benchmarks using virtual screening performance measures. We extended the large-scale linear support vector machine library LIBLINEAR with state-of-the-art virtual high-throughput screening metrics to train classifiers on whole large and unbalanced data sets. The formulation of this linear support machine has an excellent performance if applied to high-dimensional sparse feature vectors. An additional advantage is the average linear complexity in the number of non-zero features of a prediction. Nevertheless, the approach assumes that a problem is linearly separable. Therefore, we conducted an extensive benchmarking to evaluate the performance on large-scale problems up to a size of 175000 samples. To examine the virtual screening performance, we determined the chemotype clusters using Feature Trees and integrated this information to compute weighted AUC-based performance measures and a leave-cluster-out cross-validation. We also considered the BEDROC score, a metric that was suggested to tackle the early enrichment problem. The performance on each problem was evaluated by a nested cross-validation and a nested leave-cluster-out cross-validation. We compared LIBLINEAR against a Naïve Bayes classifier, a random decision forest classifier, and a maximum similarity ranking approach. These reference approaches were outperformed in a direct comparison by LIBLINEAR. A comparison to literature results showed that the LIBLINEAR performance is competitive but without achieving results as good as the top-ranked nonlinear machines on these benchmarks. However, considering the overall convincing performance and computation time of the large-scale support vector machine, the approach provides an excellent alternative to established large-scale classification approaches.

摘要

本研究的目的是将最近提出的线性大规模支持向量机应用于大规模的二元化学信息学分类问题,并使用虚拟筛选性能指标评估其在各种基准测试上的性能。我们通过虚拟高通量筛选指标扩展了大型线性支持向量机库 LIBLINEAR,以对整个大规模和不平衡数据集进行分类器训练。如果将这种线性支持机器的公式应用于高维稀疏特征向量,则其性能表现优异。另外一个优点是预测中非零特征的数量呈平均线性复杂度。然而,该方法假设问题是线性可分的。因此,我们进行了广泛的基准测试,以评估在高达 175000 个样本的大规模问题上的性能。为了检查虚拟筛选性能,我们使用特征树确定化学型聚类,并将此信息集成到计算加权 AUC 基性能指标和聚类外交叉验证中。我们还考虑了 BEDROC 分数,这是一种建议用于解决早期富集问题的指标。通过嵌套交叉验证和嵌套聚类外交叉验证评估每个问题的性能。我们将 LIBLINEAR 与朴素贝叶斯分类器、随机决策森林分类器和最大相似度排序方法进行了比较。在直接比较中,LIBLINEAR 优于这些参考方法。与文献结果的比较表明,LIBLINEAR 的性能具有竞争力,但在这些基准测试上并未达到排名靠前的非线性机器的优异结果。然而,考虑到大规模支持向量机的整体令人信服的性能和计算时间,该方法为既定的大规模分类方法提供了一个极好的替代方案。

相似文献

1
Large-scale learning of structure-activity relationships using a linear support vector machine and problem-specific metrics.使用线性支持向量机和特定于问题的指标进行大规模结构-活性关系学习。
J Chem Inf Model. 2011 Feb 28;51(2):203-13. doi: 10.1021/ci100073w. Epub 2011 Jan 5.
2
Naïve Bayes classification using 2D pharmacophore feature triplet vectors.使用二维药效团特征三元组向量的朴素贝叶斯分类法。
J Chem Inf Model. 2008 Jan;48(1):166-78. doi: 10.1021/ci7003253. Epub 2008 Jan 10.
3
Potency-directed similarity searching using support vector machines.基于支持向量机的效价导向相似度搜索。
Chem Biol Drug Des. 2011 Jan;77(1):30-8. doi: 10.1111/j.1747-0285.2010.01059.x. Epub 2010 Nov 29.
4
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.
5
StructRank: a new approach for ligand-based virtual screening.StructRank:一种新的基于配体的虚拟筛选方法。
J Chem Inf Model. 2011 Jan 24;51(1):83-92. doi: 10.1021/ci100308f. Epub 2010 Dec 17.
6
A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue.一种用于从癌组织基因表达数据中进行特征选择和规则提取的多核支持向量机方案。
Artif Intell Med. 2007 Oct;41(2):161-75. doi: 10.1016/j.artmed.2007.07.008. Epub 2007 Sep 11.
7
Combining machine learning and pharmacophore-based interaction fingerprint for in silico screening.结合机器学习和基于药效团的相互作用指纹进行计算机筛选。
J Chem Inf Model. 2010 Jan;50(1):170-85. doi: 10.1021/ci900382e.
8
Predicting antitrichomonal activity: a computational screening using atom-based bilinear indices and experimental proofs.预测抗滴虫活性:基于原子的双线性指数的计算筛选及实验验证
Bioorg Med Chem. 2006 Oct 1;14(19):6502-24. doi: 10.1016/j.bmc.2006.06.016. Epub 2006 Jul 27.
9
Classification of highly unbalanced CYP450 data of drugs using cost sensitive machine learning techniques.使用成本敏感型机器学习技术对高度不平衡的药物CYP450数据进行分类。
J Chem Inf Model. 2007 Jan-Feb;47(1):92-103. doi: 10.1021/ci6002619.
10
Virtual screening of Abl inhibitors from large compound libraries by support vector machines.利用支持向量机从大型化合物库中虚拟筛选Abl抑制剂
J Chem Inf Model. 2009 Sep;49(9):2101-10. doi: 10.1021/ci900135u.

引用本文的文献

1
Longitudinal home-cage automated assessment of climbing behavior shows sexual dimorphism and aging-related decrease in C57BL/6J healthy mice and allows early detection of motor impairment in the N171-82Q mouse model of Huntington's disease.对C57BL/6J健康小鼠攀爬行为的纵向笼内自动评估显示出性别差异以及与衰老相关的下降,并能在亨廷顿舞蹈病的N171-82Q小鼠模型中早期检测到运动障碍。
Front Behav Neurosci. 2023 Mar 22;17:1148172. doi: 10.3389/fnbeh.2023.1148172. eCollection 2023.
2
Decoding continuous variables from event-related potential (ERP) data with linear support vector regression using the Decision Decoding Toolbox (DDTBOX).使用决策解码工具箱(DDTBOX)通过线性支持向量回归从事件相关电位(ERP)数据中解码连续变量。
Front Neurosci. 2022 Nov 3;16:989589. doi: 10.3389/fnins.2022.989589. eCollection 2022.
3
Representation of Cone-Opponent Color Space in Macaque Early Visual Cortices.猕猴早期视觉皮层中锥体细胞对立颜色空间的表征。
Front Neurosci. 2022 Jun 20;16:891247. doi: 10.3389/fnins.2022.891247. eCollection 2022.
4
Using Machine Learning Algorithms to Predict Hospital Acquired Thrombocytopenia after Operation in the Intensive Care Unit: A Retrospective Cohort Study.使用机器学习算法预测重症监护病房术后医院获得性血小板减少症:一项回顾性队列研究
Diagnostics (Basel). 2021 Sep 3;11(9):1614. doi: 10.3390/diagnostics11091614.
5
Machine Learning Methods in Drug Discovery.药物发现中的机器学习方法。
Molecules. 2020 Nov 12;25(22):5277. doi: 10.3390/molecules25225277.
6
Segmentation and Classification in Digital Pathology for Glioma Research: Challenges and Deep Learning Approaches.用于胶质瘤研究的数字病理学中的分割与分类:挑战与深度学习方法
Front Neurosci. 2020 Feb 21;14:27. doi: 10.3389/fnins.2020.00027. eCollection 2020.
7
sEMG-Based Trunk Compensation Detection in Rehabilitation Training.康复训练中基于表面肌电图的躯干代偿检测
Front Neurosci. 2019 Nov 21;13:1250. doi: 10.3389/fnins.2019.01250. eCollection 2019.
8
Automated, Efficient, and Accelerated Knowledge Modeling of the Cognitive Neuroimaging Literature Using the ATHENA Toolkit.使用雅典娜工具包对认知神经影像学文献进行自动化、高效且加速的知识建模。
Front Neurosci. 2019 May 15;13:494. doi: 10.3389/fnins.2019.00494. eCollection 2019.
9
Three-Dimensional Biologically Relevant Spectrum (BRS-3D): Shape Similarity Profile Based on PDB Ligands as Molecular Descriptors.三维生物相关光谱(BRS-3D):基于蛋白质数据银行(PDB)配体作为分子描述符的形状相似性概况
Molecules. 2016 Nov 17;21(11):1554. doi: 10.3390/molecules21111554.
10
Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest.基于支持向量机-蛋白质特征和随机森林的G蛋白偶联受体预测
Scientifica (Cairo). 2016;2016:8309253. doi: 10.1155/2016/8309253. Epub 2016 Jul 27.