Schwabe Daniel, Becker Katinka, Seyferth Martin, Klaß Andreas, Schaeffter Tobias
Division Medical Physics and Metrological Information Technology, Physikalisch-Technische Bundesanstalt, Berlin, Germany.
Department of Medical Engineering, Technical University Berlin, Berlin, Germany.
NPJ Digit Med. 2024 Aug 3;7(1):203. doi: 10.1038/s41746-024-01196-4.
The adoption of machine learning (ML) and, more specifically, deep learning (DL) applications into all major areas of our lives is underway. The development of trustworthy AI is especially important in medicine due to the large implications for patients' lives. While trustworthiness concerns various aspects including ethical, transparency and safety requirements, we focus on the importance of data quality (training/test) in DL. Since data quality dictates the behaviour of ML products, evaluating data quality will play a key part in the regulatory approval of medical ML products. We perform a systematic review following PRISMA guidelines using the databases Web of Science, PubMed and ACM Digital Library. We identify 5408 studies, out of which 120 records fulfil our eligibility criteria. From this literature, we synthesise the existing knowledge on data quality frameworks and combine it with the perspective of ML applications in medicine. As a result, we propose the METRIC-framework, a specialised data quality framework for medical training data comprising 15 awareness dimensions, along which developers of medical ML applications should investigate the content of a dataset. This knowledge helps to reduce biases as a major source of unfairness, increase robustness, facilitate interpretability and thus lays the foundation for trustworthy AI in medicine. The METRIC-framework may serve as a base for systematically assessing training datasets, establishing reference datasets, and designing test datasets which has the potential to accelerate the approval of medical ML products.
机器学习(ML),更具体地说是深度学习(DL)应用正在融入我们生活的所有主要领域。由于对患者生命有重大影响,可信赖人工智能的发展在医学领域尤为重要。虽然可信赖性涉及包括伦理、透明度和安全要求等各个方面,但我们关注深度学习中数据质量(训练/测试)的重要性。由于数据质量决定了机器学习产品的行为,评估数据质量将在医疗机器学习产品的监管审批中发挥关键作用。我们按照PRISMA指南,使用科学网、PubMed和ACM数字图书馆数据库进行了系统综述。我们识别出5408项研究,其中120条记录符合我们的纳入标准。从这些文献中,我们综合了关于数据质量框架的现有知识,并将其与医学中机器学习应用的视角相结合。因此,我们提出了METRIC框架,这是一个针对医学训练数据的专门数据质量框架,包括15个认知维度,医学机器学习应用的开发者应沿着这些维度研究数据集的内容。这些知识有助于减少作为不公平主要来源的偏差,提高稳健性,促进可解释性,从而为医学中可信赖人工智能奠定基础。METRIC框架可作为系统评估训练数据集、建立参考数据集和设计测试数据集的基础,这有可能加速医疗机器学习产品的审批。