Chavez Tanny, Zhao Zhuowen, Jiang Runbo, Koepp Wiebke, McReynolds Dylan, Zwart Petrus H, Allan Daniel B, Gann Eliot H, Schwarz Nicholas, Ushizima Daniela, Barnard Edward S, Mehta Apurva, Sankaranarayanan Subramanian, Hexemer Alexander
Advanced Light Source, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
Center for Advanced Mathematics for Energy Research Applications, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
J Appl Crystallogr. 2025 May 12;58(Pt 3):731-745. doi: 10.1107/S1600576725002328. eCollection 2025 Jun 1.
This study introduces a novel labeling pipeline to accelerate the labeling process of scientific data sets by using artificial intelligence (AI)-guided tagging techniques. This pipeline includes a set of interconnected web-based graphical user interfaces (GUIs), where and enable the preparation of machine learning (ML) models for data reduction and classification, respectively, while is used for label assignment. Throughout this pipeline, data can be accessed through a direct connection to a file system or through for access through Hypertext Transfer Protocol (HTTP). Our experimental results present three use cases where this labeling pipeline has been instrumental for the study of large X-ray scattering data sets in the area of pattern recognition, the remote analysis of resonant soft X-ray scattering data and the fine-tuning process of foundation models. These use cases highlight the labeling capabilities of this pipeline, including the ability to label large data sets in a short period of time, to perform remote data analysis while minimizing data movement and to enhance the fine-tuning process of complex ML models with human involvement.
本研究引入了一种新颖的标注流程,通过使用人工智能(AI)引导的标记技术来加速科学数据集的标注过程。该流程包括一组相互连接的基于网络的图形用户界面(GUI),其中 和 分别用于为数据缩减和分类准备机器学习(ML)模型,而 用于标签分配。在整个流程中,数据可以通过直接连接到文件系统进行访问,或者通过 以超文本传输协议(HTTP)进行访问。我们的实验结果展示了三个用例,在这些用例中,此标注流程在模式识别领域的大型X射线散射数据集研究、共振软X射线散射数据的远程分析以及基础模型的微调过程中发挥了重要作用。这些用例突出了该流程的标注能力,包括在短时间内标注大型数据集的能力、在最小化数据移动的同时进行远程数据分析的能力以及在人工参与下增强复杂ML模型微调过程的能力。