Bhawsar Praphulla M S, Abubakar Mustapha, Schmidt Marjanka K, Camp Nicola J, Cessna Melissa H, Duggan Máire A, García-Closas Montserrat, Almeida Jonas S
Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Maryland, USA.
Division of Molecular Pathology, Netherlands Cancer Institute, Antoni Van Leeuwenhoek Hospital, Amsterdam, The Netherlands.
J Pathol Inform. 2021 Sep 27;12:38. doi: 10.4103/jpi.jpi_100_20. eCollection 2021.
Artificial intelligence (AI) is fast becoming the tool of choice for scalable and reliable analysis of medical images. However, constraints in sharing medical data outside the institutional or geographical space, as well as difficulties in getting AI models and modeling platforms to work across different environments, have led to a "reproducibility crisis" in digital medicine.
This study details the implementation of a web platform that can be used to mitigate these challenges by orchestrating a digital pathology AI pipeline, from raw data to model inference, entirely on the local machine. We discuss how this federated platform provides governed access to data by consuming the Application Program Interfaces exposed by cloud storage services, allows the addition of user-defined annotations, facilitates active learning for training models iteratively, and provides model inference computed directly in the web browser at practically zero cost. The latter is of particular relevance to clinical workflows because the code, including the AI model, travels to the user's data, which stays private to the governance domain where it was acquired.
We demonstrate that the web browser can be a means of democratizing AI and advancing data socialization in medical imaging backed by consumer-facing cloud infrastructure such as Box.com. As a case study, we test the accompanying platform end-to-end on a large dataset of digital breast cancer tissue microarray core images. We also showcase how it can be applied in contexts separate from digital pathology by applying it to a radiology dataset containing COVID-19 computed tomography images.
The platform described in this report resolves the challenges to the findable, accessible, interoperable, reusable stewardship of data and AI models by integrating with cloud storage to maintain user-centric governance over the data. It also enables distributed, federated computation for AI inference over those data and proves the viability of client-side AI in medical imaging.
The open-source application is publicly available at , with a short video demonstration at .
人工智能(AI)正迅速成为医学图像可扩展且可靠分析的首选工具。然而,在机构或地理空间之外共享医学数据存在限制,以及使人工智能模型和建模平台在不同环境中运行存在困难,导致数字医学出现了“可重复性危机”。
本研究详细介绍了一个网络平台的实施情况,该平台可通过在本地机器上精心编排从原始数据到模型推理的数字病理学人工智能管道来缓解这些挑战。我们讨论了这个联邦平台如何通过使用云存储服务公开的应用程序编程接口来提供对数据的受管访问,允许添加用户定义的注释,促进迭代训练模型的主动学习,并以几乎零成本在网络浏览器中直接提供模型推理。后者与临床工作流程特别相关,因为包括人工智能模型在内的代码会传输到用户数据,而用户数据对于获取它的治理域来说是私有的。
我们证明,在诸如Box.com等面向消费者的云基础设施支持下,网络浏览器可以成为使人工智能民主化并推进医学成像数据社会化的一种手段。作为一个案例研究,我们在一个大型数字乳腺癌组织微阵列核心图像数据集上对随附平台进行了端到端测试。我们还展示了如何通过将其应用于包含COVID-19计算机断层扫描图像的放射学数据集,将其应用于与数字病理学不同的背景中。
本报告中描述的平台通过与云存储集成以维持以用户为中心的数据治理,解决了数据和人工智能模型在可查找、可访问、可互操作、可重用管理方面的挑战。它还实现了对这些数据进行人工智能推理的分布式联邦计算,并证明了客户端人工智能在医学成像中的可行性。
该开源应用程序可在 公开获取,在 有简短的视频演示。