Suppr超能文献

自动化检测低质量数据:医疗保健领域的案例研究。

Automated detection of poor-quality data: case studies in healthcare.

机构信息

Presagen, Adelaide, SA, 5000, Australia.

School of Mathematical Sciences, The University of Adelaide, Adelaide, SA, 5000, Australia.

出版信息

Sci Rep. 2021 Sep 9;11(1):18005. doi: 10.1038/s41598-021-97341-0.

Abstract

The detection and removal of poor-quality data in a training set is crucial to achieve high-performing AI models. In healthcare, data can be inherently poor-quality due to uncertainty or subjectivity, but as is often the case, the requirement for data privacy restricts AI practitioners from accessing raw training data, meaning manual visual verification of private patient data is not possible. Here we describe a novel method for automated identification of poor-quality data, called Untrainable Data Cleansing. This method is shown to have numerous benefits including protection of private patient data; improvement in AI generalizability; reduction in time, cost, and data needed for training; all while offering a truer reporting of AI performance itself. Additionally, results show that Untrainable Data Cleansing could be useful as a triage tool to identify difficult clinical cases that may warrant in-depth evaluation or additional testing to support a diagnosis.

摘要

在训练集中检测和去除低质量数据对于实现高性能的 AI 模型至关重要。在医疗保健领域,由于不确定性或主观性,数据可能天生就低质量,但通常情况下,对数据隐私的要求限制了 AI 从业者访问原始训练数据,这意味着无法对私人患者数据进行手动视觉验证。在这里,我们描述了一种名为“不可训练数据清理”的自动识别低质量数据的新方法。该方法具有许多优点,包括保护私人患者数据;提高 AI 的泛化能力;减少训练所需的时间、成本和数据;同时更真实地报告 AI 本身的性能。此外,结果表明,不可训练数据清理可用作一种分诊工具,以识别可能需要深入评估或额外测试以支持诊断的困难临床病例。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/830b/8429593/8367f36fe47c/41598_2021_97341_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验