Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:1053-1057. doi: 10.1109/EMBC48229.2022.9871018.
Data harmonization is one of the greatest challenges in cancer imaging studies, especially when it comes to multi-source data provision. Properly integrated data deriving from various sources can ensure data fairness on one side and can lead to a trusted dataset that will enhance AI engine development on the other side. Towards this direction, we are presenting a data integration quality check tool that ensures that all data uploaded to the repository are homogenized and share the same principles. The tool's aim is to report any human-induced errors and propose corrective actions. It focuses on checking the data prior to their upload to the repository in five levels: (i) clinical metadata integrity, (ii) template-imaging consistency, (iii) anonymization protocol applied, (iv) imaging analysis requirements, (v) case completeness. The tool produces reports with the corrective actions that must be followed by the user. This way the tool ensures that the data that will become available to the developers of the AI engine are homogenized, properly structured and contain all the necessary information needed for the analysis. The tool was validated in two rounds, internal and external, and at the user experience level. Clinical Relevance- Supporting the harmonized preparation and provision of medical imaging data and related clinical data will ensure data fairness and enhance the AI development.
数据协调是癌症成像研究中最大的挑战之一,特别是在多源数据提供方面。来自不同来源的适当集成数据可以保证数据的公平性,另一方面可以形成一个值得信赖的数据集,从而增强人工智能引擎的开发。为此,我们提出了一种数据集成质量检查工具,确保上传到存储库中的所有数据都实现了同质化,并遵循相同的原则。该工具的目的是报告任何人为错误,并提出纠正措施。它专注于在五个层面上检查上传到存储库之前的数据:(i)临床元数据完整性,(ii)模板-成像一致性,(iii)应用的匿名化协议,(iv)成像分析要求,(v)病例完整性。该工具生成带有用户必须遵循的纠正措施的报告。通过这种方式,该工具确保可供人工智能引擎开发人员使用的数据实现了同质化、适当的结构化,并包含了分析所需的所有必要信息。该工具已经在内部和外部两轮以及用户体验层面进行了验证。临床相关性——支持医学成像数据和相关临床数据的协调准备和提供,将确保数据的公平性,并增强人工智能的发展。