Suppr超能文献

分类器及其度量指标的量化。

Classifiers and their Metrics Quantified.

机构信息

Kyoto University Graduate School of Medicine, Laboratory of Molecular Biosciences, 606-8501, E-109 Konoemachi, Sakyo, Kyoto, Japan.

出版信息

Mol Inform. 2018 Jan;37(1-2). doi: 10.1002/minf.201700127. Epub 2018 Jan 23.

Abstract

Molecular modeling frequently constructs classification models for the prediction of two-class entities, such as compound bio(in)activity, chemical property (non)existence, protein (non)interaction, and so forth. The models are evaluated using well known metrics such as accuracy or true positive rates. However, these frequently used metrics applied to retrospective and/or artificially generated prediction datasets can potentially overestimate true performance in actual prospective experiments. Here, we systematically consider metric value surface generation as a consequence of data balance, and propose the computation of an inverse cumulative distribution function taken over a metric surface. The proposed distribution analysis can aid in the selection of metrics when formulating study design. In addition to theoretical analyses, a practical example in chemogenomic virtual screening highlights the care required in metric selection and interpretation.

摘要

分子建模经常构建用于预测两类实体的分类模型,例如化合物生物(无)活性、化学性质(无)存在、蛋白质(无)相互作用等。这些模型使用诸如准确性或真阳性率等著名指标进行评估。然而,这些常用于回顾性和/或人为生成的预测数据集的指标可能会高估实际前瞻性实验中的真实性能。在这里,我们系统地考虑了由于数据平衡而导致的指标值曲面生成,并提出了计算指标曲面上的逆累积分布函数。所提出的分布分析有助于在制定研究设计时选择指标。除了理论分析之外,化学基因组虚拟筛选中的一个实际示例突出了在选择和解释指标时需要注意的事项。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/357b/5838539/bda738f34778/MINF-37-na-g002.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验