Suppr超能文献

自动化检测低质量数据:医疗保健领域的案例研究。

Automated detection of poor-quality data: case studies in healthcare.

机构信息

Presagen, Adelaide, SA, 5000, Australia.

School of Mathematical Sciences, The University of Adelaide, Adelaide, SA, 5000, Australia.

出版信息

Sci Rep. 2021 Sep 9;11(1):18005. doi: 10.1038/s41598-021-97341-0.

Abstract

The detection and removal of poor-quality data in a training set is crucial to achieve high-performing AI models. In healthcare, data can be inherently poor-quality due to uncertainty or subjectivity, but as is often the case, the requirement for data privacy restricts AI practitioners from accessing raw training data, meaning manual visual verification of private patient data is not possible. Here we describe a novel method for automated identification of poor-quality data, called Untrainable Data Cleansing. This method is shown to have numerous benefits including protection of private patient data; improvement in AI generalizability; reduction in time, cost, and data needed for training; all while offering a truer reporting of AI performance itself. Additionally, results show that Untrainable Data Cleansing could be useful as a triage tool to identify difficult clinical cases that may warrant in-depth evaluation or additional testing to support a diagnosis.

摘要

在训练集中检测和去除低质量数据对于实现高性能的 AI 模型至关重要。在医疗保健领域,由于不确定性或主观性,数据可能天生就低质量,但通常情况下,对数据隐私的要求限制了 AI 从业者访问原始训练数据,这意味着无法对私人患者数据进行手动视觉验证。在这里,我们描述了一种名为“不可训练数据清理”的自动识别低质量数据的新方法。该方法具有许多优点,包括保护私人患者数据;提高 AI 的泛化能力;减少训练所需的时间、成本和数据;同时更真实地报告 AI 本身的性能。此外,结果表明,不可训练数据清理可用作一种分诊工具,以识别可能需要深入评估或额外测试以支持诊断的困难临床病例。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/830b/8429593/8367f36fe47c/41598_2021_97341_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验