• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

分类器及其度量指标的量化。

Classifiers and their Metrics Quantified.

机构信息

Kyoto University Graduate School of Medicine, Laboratory of Molecular Biosciences, 606-8501, E-109 Konoemachi, Sakyo, Kyoto, Japan.

出版信息

Mol Inform. 2018 Jan;37(1-2). doi: 10.1002/minf.201700127. Epub 2018 Jan 23.

DOI:10.1002/minf.201700127
PMID:29360259
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5838539/
Abstract

Molecular modeling frequently constructs classification models for the prediction of two-class entities, such as compound bio(in)activity, chemical property (non)existence, protein (non)interaction, and so forth. The models are evaluated using well known metrics such as accuracy or true positive rates. However, these frequently used metrics applied to retrospective and/or artificially generated prediction datasets can potentially overestimate true performance in actual prospective experiments. Here, we systematically consider metric value surface generation as a consequence of data balance, and propose the computation of an inverse cumulative distribution function taken over a metric surface. The proposed distribution analysis can aid in the selection of metrics when formulating study design. In addition to theoretical analyses, a practical example in chemogenomic virtual screening highlights the care required in metric selection and interpretation.

摘要

分子建模经常构建用于预测两类实体的分类模型,例如化合物生物(无)活性、化学性质(无)存在、蛋白质(无)相互作用等。这些模型使用诸如准确性或真阳性率等著名指标进行评估。然而,这些常用于回顾性和/或人为生成的预测数据集的指标可能会高估实际前瞻性实验中的真实性能。在这里,我们系统地考虑了由于数据平衡而导致的指标值曲面生成,并提出了计算指标曲面上的逆累积分布函数。所提出的分布分析有助于在制定研究设计时选择指标。除了理论分析之外,化学基因组虚拟筛选中的一个实际示例突出了在选择和解释指标时需要注意的事项。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/9ba093ea9f7d/MINF-37-na-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/bda738f34778/MINF-37-na-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/6970d78f3451/MINF-37-na-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/dc272587c521/MINF-37-na-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/42f4f1fc7db8/MINF-37-na-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/b1ea5e5b47a5/MINF-37-na-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/f151b73c7fb5/MINF-37-na-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/d0341f06e463/MINF-37-na-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/9ba093ea9f7d/MINF-37-na-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/bda738f34778/MINF-37-na-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/6970d78f3451/MINF-37-na-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/dc272587c521/MINF-37-na-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/42f4f1fc7db8/MINF-37-na-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/b1ea5e5b47a5/MINF-37-na-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/f151b73c7fb5/MINF-37-na-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/d0341f06e463/MINF-37-na-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/9ba093ea9f7d/MINF-37-na-g009.jpg

相似文献

1
Classifiers and their Metrics Quantified.分类器及其度量指标的量化。
Mol Inform. 2018 Jan;37(1-2). doi: 10.1002/minf.201700127. Epub 2018 Jan 23.
2
General Approach to Estimate Error Bars for Quantitative Structure-Activity Relationship Predictions of Molecular Activity.定量构效关系预测分子活性的误差估计的一般方法。
J Chem Inf Model. 2018 Aug 27;58(8):1561-1575. doi: 10.1021/acs.jcim.8b00114. Epub 2018 Jul 17.
3
The current limits in virtual screening and property prediction.虚拟筛选和性质预测的当前限制。
Future Med Chem. 2018 Jul 1;10(13):1623-1635. doi: 10.4155/fmc-2017-0303. Epub 2018 Jun 28.
4
Identification of Bioactive Scaffolds Based on QSAR Models.基于 QSAR 模型的生物活性支架鉴定。
Mol Inform. 2018 Jan;37(1-2). doi: 10.1002/minf.201700103. Epub 2017 Nov 14.
5
Chemical Structure Similarity Search for Ligand-based Virtual Screening: Methods and Computational Resources.基于配体的虚拟筛选的化学结构相似性搜索:方法与计算资源
Curr Drug Targets. 2016;17(14):1580-1585. doi: 10.2174/1389450116666151102095555.
6
An unbiased metric of antiproliferative drug effect in vitro.体外抗增殖药物作用的无偏倚指标。
Nat Methods. 2016 Jun;13(6):497-500. doi: 10.1038/nmeth.3852. Epub 2016 May 2.
7
Scoring of de novo Designed Chemical Entities by Macromolecular Target Prediction.从头设计的化学实体的大分子靶预测评分。
Mol Inform. 2017 Jan;36(1-2). doi: 10.1002/minf.201600110. Epub 2016 Sep 19.
8
BCL::Mol2D-a robust atom environment descriptor for QSAR modeling and lead optimization.BCL::Mol2D——用于 QSAR 建模和先导化合物优化的强大原子环境描述符。
J Comput Aided Mol Des. 2019 May;33(5):477-486. doi: 10.1007/s10822-019-00199-8. Epub 2019 Apr 6.
9
Matrix-based Molecular Descriptors for Prospective Virtual Compound Screening.基于矩阵的分子描述符用于潜在虚拟化合物筛选。
Mol Inform. 2017 Jan;36(1-2). doi: 10.1002/minf.201600091. Epub 2016 Sep 21.
10
The Development of a Weighted Index to Optimise Compound Libraries for High Throughput Screening.一种用于高通量筛选的化合物库优化加权指数的开发。
Mol Inform. 2019 Mar;38(3):e1800068. doi: 10.1002/minf.201800068. Epub 2018 Oct 22.

引用本文的文献

1
Evaluating the three-level approach of the U-smile method for imbalanced binary classification.评估U-smile方法用于不平衡二元分类的三级方法。
PLoS One. 2025 Apr 10;20(4):e0321661. doi: 10.1371/journal.pone.0321661. eCollection 2025.
2
Reliability and Validity of Self-Reported Risk Factors for Stroke and Dementia.中风和痴呆自我报告风险因素的可靠性和有效性。
J Am Heart Assoc. 2025 Apr;14(7):e038730. doi: 10.1161/JAHA.124.038730. Epub 2025 Mar 21.
3
Off-the-Shelf Large Language Models for Causality Assessment of Individual Case Safety Reports: A Proof-of-Concept with COVID-19 Vaccines.

本文引用的文献

1
Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric.使用马修斯相关系数度量的不平衡数据最优分类器。
PLoS One. 2017 Jun 2;12(6):e0177678. doi: 10.1371/journal.pone.0177678. eCollection 2017.
2
Active learning for computational chemogenomics.计算化学生物基因组学的主动学习。
Future Med Chem. 2017 Mar;9(4):381-402. doi: 10.4155/fmc-2016-0197. Epub 2017 Mar 6.
3
The power metric: a new statistically robust enrichment-type metric for virtual screening applications with early recovery capability.
用于个体病例安全报告因果关系评估的现成大语言模型:以新冠疫苗为例的概念验证
Drug Saf. 2025 Mar 12. doi: 10.1007/s40264-025-01531-y.
4
Deployment of a ordered transposon mutant library in a quorum-competent genetic background.在具备群体感应能力的遗传背景中部署有序转座子突变体文库。
mBio. 2025 Apr 9;16(4):e0003625. doi: 10.1128/mbio.00036-25. Epub 2025 Feb 25.
5
A Machine Learning-Based Web Tool for the Severity Prediction of COVID-19.一种基于机器学习的用于预测COVID-19严重程度的网络工具。
BioTech (Basel). 2024 Jul 1;13(3):22. doi: 10.3390/biotech13030022.
6
Drug interaction with UDP-Glucuronosyltransferase (UGT) enzymes is a predictor of drug-induced liver injury.药物与尿苷二磷酸葡萄糖醛酸转移酶(UGT)的相互作用是药物性肝损伤的一个预测指标。
Hepatology. 2025 May 1;81(5):1512-1521. doi: 10.1097/HEP.0000000000001007. Epub 2024 Jul 17.
7
Radiomics Machine Learning Analysis of Clear Cell Renal Cell Carcinoma for Tumour Grade Prediction Based on Intra-Tumoural Sub-Region Heterogeneity.基于肿瘤内亚区域异质性的透明细胞肾细胞癌肿瘤分级预测的影像组学机器学习分析
Cancers (Basel). 2024 Apr 10;16(8):1454. doi: 10.3390/cancers16081454.
8
Deployment of a ordered transposon mutant library in a quorum-competent genetic background.在具备群体感应能力的遗传背景中部署有序转座子突变体文库。
bioRxiv. 2024 Apr 2:2023.10.31.564941. doi: 10.1101/2023.10.31.564941.
9
Comparison of machine learning techniques in prediction of mortality following cardiac surgery: analysis of over 220 000 patients from a large national database.机器学习技术在心脏手术后死亡率预测中的比较:来自大型国家数据库的 22 万多例患者的分析。
Eur J Cardiothorac Surg. 2023 Jun 1;63(6). doi: 10.1093/ejcts/ezad183.
10
A User's Guide to Machine Learning for Polymeric Biomaterials.用于高分子生物材料的机器学习用户指南
ACS Polym Au. 2022 Nov 17;3(2):141-157. doi: 10.1021/acspolymersau.2c00037. eCollection 2023 Apr 12.
功率指标:一种用于虚拟筛选应用的具有早期恢复能力的新型统计稳健型富集类指标。
J Cheminform. 2017 Feb 2;9:7. doi: 10.1186/s13321-016-0189-4. eCollection 2017.
4
Oligosaccharyltransferase inhibition induces senescence in RTK-driven tumor cells.寡糖基转移酶抑制可诱导受体酪氨酸激酶驱动的肿瘤细胞衰老。
Nat Chem Biol. 2016 Dec;12(12):1023-1030. doi: 10.1038/nchembio.2194. Epub 2016 Oct 3.
5
Development of a High-Throughput Gene Expression Screen for Modulators of RAS-MAPK Signaling in a Mutant RAS Cellular Context.在突变型RAS细胞环境中开发用于RAS-MAPK信号调节剂的高通量基因表达筛选。
J Biomol Screen. 2016 Oct;21(9):989-97. doi: 10.1177/1087057116658646. Epub 2016 Jul 26.
6
Feasibility of Active Machine Learning for Multiclass Compound Classification.主动机器学习在多类化合物分类中的可行性研究。
J Chem Inf Model. 2016 Jan 25;56(1):12-20. doi: 10.1021/acs.jcim.5b00332. Epub 2016 Jan 7.
7
GLASS: a comprehensive database for experimentally validated GPCR-ligand associations.GLASS:一个用于实验验证的GPCR-配体关联的综合数据库。
Bioinformatics. 2015 Sep 15;31(18):3035-42. doi: 10.1093/bioinformatics/btv302. Epub 2015 May 13.
8
The ChEMBL bioactivity database: an update.《ChEMBL 生物活性数据库更新》
Nucleic Acids Res. 2014 Jan;42(Database issue):D1083-90. doi: 10.1093/nar/gkt1031. Epub 2013 Nov 7.
9
What has virtual screening ever done for drug discovery?虚拟筛选在药物发现中发挥了什么作用?
Expert Opin Drug Discov. 2008 Aug;3(8):841-51. doi: 10.1517/17460441.3.8.841.
10
A comparison of MCC and CEN error measures in multi-class prediction.多类预测中 MCC 和 CEN 误差度量的比较。
PLoS One. 2012;7(8):e41882. doi: 10.1371/journal.pone.0041882. Epub 2012 Aug 8.