Suppr超能文献

针对具有两个以上类别的诊断设置的最优分类和广义患病率估计。

Optimal classification and generalized prevalence estimates for diagnostic settings with more than two classes.

作者信息

Luke Rayanne A, Kearsley Anthony J, Patrone Paul N

机构信息

Johns Hopkins University, Department of Applied Mathematics and Statistics, Baltimore, 21218, MD, USA; National Institute of Standards and Technology, Information Technology Laboratory, Gaithersburg, 20899, MD, USA.

National Institute of Standards and Technology, Information Technology Laboratory, Gaithersburg, 20899, MD, USA.

出版信息

Math Biosci. 2023 Apr;358:108982. doi: 10.1016/j.mbs.2023.108982. Epub 2023 Feb 17.

Abstract

An accurate multiclass classification strategy is crucial to interpreting antibody tests. However, traditional methods based on confidence intervals or receiver operating characteristics lack clear extensions to settings with more than two classes. We address this problem by developing a multiclass classification based on probabilistic modeling and optimal decision theory that minimizes the convex combination of false classification rates. The classification process is challenging when the relative fraction of the population in each class, or generalized prevalence, is unknown. Thus, we also develop a method for estimating the generalized prevalence of test data that is independent of classification of the test data. We validate our approach on serological data with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) naïve, previously infected, and vaccinated classes. Synthetic data are used to demonstrate that (i) prevalence estimates are unbiased and converge to true values and (ii) our procedure applies to arbitrary measurement dimensions. In contrast to the binary problem, the multiclass setting offers wide-reaching utility as the most general framework and provides new insight into prevalence estimation best practices.

摘要

一种准确的多类分类策略对于解释抗体检测至关重要。然而,基于置信区间或接收器操作特性的传统方法缺乏对两类以上情况的明确扩展。我们通过开发一种基于概率建模和最优决策理论的多类分类来解决这个问题,该理论将错误分类率的凸组合最小化。当每个类别的人群相对比例或广义流行率未知时,分类过程具有挑战性。因此,我们还开发了一种估计测试数据广义流行率的方法,该方法与测试数据的分类无关。我们在具有严重急性呼吸综合征冠状病毒2(SARS-CoV-2)未感染、既往感染和接种疫苗类别的血清学数据上验证了我们的方法。合成数据用于证明(i)流行率估计是无偏的并且收敛到真实值,以及(ii)我们的程序适用于任意测量维度。与二元问题不同,多类设置作为最通用的框架具有广泛的实用性,并为流行率估计最佳实践提供了新的见解。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验