• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于科学分析的机器学习驱动的数据标注流程 。 (原文中“in.”后面似乎缺少具体内容)

A machine-learning-driven data labeling pipeline for scientific analysis in .

作者信息

Chavez Tanny, Zhao Zhuowen, Jiang Runbo, Koepp Wiebke, McReynolds Dylan, Zwart Petrus H, Allan Daniel B, Gann Eliot H, Schwarz Nicholas, Ushizima Daniela, Barnard Edward S, Mehta Apurva, Sankaranarayanan Subramanian, Hexemer Alexander

机构信息

Advanced Light Source, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.

Center for Advanced Mathematics for Energy Research Applications, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.

出版信息

J Appl Crystallogr. 2025 May 12;58(Pt 3):731-745. doi: 10.1107/S1600576725002328. eCollection 2025 Jun 1.

DOI:10.1107/S1600576725002328
PMID:40475946
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12135984/
Abstract

This study introduces a novel labeling pipeline to accelerate the labeling process of scientific data sets by using artificial intelligence (AI)-guided tagging techniques. This pipeline includes a set of interconnected web-based graphical user interfaces (GUIs), where and enable the preparation of machine learning (ML) models for data reduction and classification, respectively, while is used for label assignment. Throughout this pipeline, data can be accessed through a direct connection to a file system or through for access through Hypertext Transfer Protocol (HTTP). Our experimental results present three use cases where this labeling pipeline has been instrumental for the study of large X-ray scattering data sets in the area of pattern recognition, the remote analysis of resonant soft X-ray scattering data and the fine-tuning process of foundation models. These use cases highlight the labeling capabilities of this pipeline, including the ability to label large data sets in a short period of time, to perform remote data analysis while minimizing data movement and to enhance the fine-tuning process of complex ML models with human involvement.

摘要

本研究引入了一种新颖的标注流程,通过使用人工智能(AI)引导的标记技术来加速科学数据集的标注过程。该流程包括一组相互连接的基于网络的图形用户界面(GUI),其中 和 分别用于为数据缩减和分类准备机器学习(ML)模型,而 用于标签分配。在整个流程中,数据可以通过直接连接到文件系统进行访问,或者通过 以超文本传输协议(HTTP)进行访问。我们的实验结果展示了三个用例,在这些用例中,此标注流程在模式识别领域的大型X射线散射数据集研究、共振软X射线散射数据的远程分析以及基础模型的微调过程中发挥了重要作用。这些用例突出了该流程的标注能力,包括在短时间内标注大型数据集的能力、在最小化数据移动的同时进行远程数据分析的能力以及在人工参与下增强复杂ML模型微调过程的能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/a22c77db1e78/j-58-00731-fig13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/b91221411983/j-58-00731-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/1d57d6dac49b/j-58-00731-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/f6184c32ccd4/j-58-00731-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/18c667344697/j-58-00731-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/a61dc3cc3204/j-58-00731-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/db4eea11ae86/j-58-00731-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/febcdccd4a03/j-58-00731-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/c9277567d64c/j-58-00731-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/6569b1e34499/j-58-00731-fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/d8a66da21cb3/j-58-00731-fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/13728b7143f0/j-58-00731-fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/f7dd187aa55c/j-58-00731-fig12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/a22c77db1e78/j-58-00731-fig13.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/b91221411983/j-58-00731-fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/1d57d6dac49b/j-58-00731-fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/f6184c32ccd4/j-58-00731-fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/18c667344697/j-58-00731-fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/a61dc3cc3204/j-58-00731-fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/db4eea11ae86/j-58-00731-fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/febcdccd4a03/j-58-00731-fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/c9277567d64c/j-58-00731-fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/6569b1e34499/j-58-00731-fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/d8a66da21cb3/j-58-00731-fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/13728b7143f0/j-58-00731-fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/f7dd187aa55c/j-58-00731-fig12.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/025f/12135984/a22c77db1e78/j-58-00731-fig13.jpg

相似文献

1
A machine-learning-driven data labeling pipeline for scientific analysis in .一种用于科学分析的机器学习驱动的数据标注流程 。 (原文中“in.”后面似乎缺少具体内容)
J Appl Crystallogr. 2025 May 12;58(Pt 3):731-745. doi: 10.1107/S1600576725002328. eCollection 2025 Jun 1.
2
Deploying Machine Learning Based Segmentation for Scientific Imaging Analysis at Synchrotron Facilities.在同步加速器设施中部署基于机器学习的分割技术用于科学成像分析。
IS&T Int Symp Electron Imaging. 2023;35. doi: 10.2352/ei.2023.35.9.ipas-290.
3
MLExchange: A web-based platform enabling exchangeable machine learning workflows for scientific studies.MLExchange:一个基于网络的平台,可实现用于科学研究的可交换机器学习工作流程。
Annu Workshop Extrem Scale Exp Loop Comput. 2022 Nov;2022:10-15. doi: 10.1109/xloop56614.2022.00007.
4
Pulmonary abnormality screening on chest x-rays from different machine specifications: a generalized AI-based image manipulation pipeline.不同机型胸片肺部异常筛查:基于 AI 的通用图像操作流程。
Eur Radiol Exp. 2023 Nov 9;7(1):68. doi: 10.1186/s41747-023-00386-1.
5
A learning-based material decomposition pipeline for multi-energy x-ray imaging.基于学习的多能量 X 射线成像材料分解管道。
Med Phys. 2019 Feb;46(2):689-703. doi: 10.1002/mp.13317. Epub 2018 Dec 24.
6
A Comprehensive Machine Learning Benchmark Study for Radiomics-Based Survival Analysis of CT Imaging Data in Patients With Hepatic Metastases of CRC.基于 CT 成像数据的 CRC 肝转移瘤生存分析的放射组学的全面机器学习基准研究。
Invest Radiol. 2023 Dec 1;58(12):874-881. doi: 10.1097/RLI.0000000000001009. Epub 2023 Jul 28.
7
An Advanced Machine Learning Model for a Web-Based Artificial Intelligence-Based Clinical Decision Support System Application: Model Development and Validation Study.基于人工智能的临床决策支持系统的基于网络的人工智能临床决策支持系统应用的高级机器学习模型:模型开发和验证研究。
J Med Internet Res. 2024 Sep 4;26:e56022. doi: 10.2196/56022.
8
Incremental Learning to Personalize Human Activity Recognition Models: The Importance of Human AI Collaboration.个性化人类活动识别模型的增量学习:人机 AI 协作的重要性。
Sensors (Basel). 2019 Nov 25;19(23):5151. doi: 10.3390/s19235151.
9
Hyperspectral Image Labeling and Classification Using an Ensemble Semi-Supervised Machine Learning Approach.基于集成半监督机器学习方法的高光谱图像标记和分类。
Sensors (Basel). 2022 Feb 18;22(4):1623. doi: 10.3390/s22041623.
10
A Design-to-Device Pipeline for Data-Driven Materials Discovery.数据驱动的材料发现的设计到器件的流水线。
Acc Chem Res. 2020 Mar 17;53(3):599-610. doi: 10.1021/acs.accounts.9b00470. Epub 2020 Feb 25.

本文引用的文献

1
DLSIA: Deep Learning for Scientific Image Analysis.DLSIA:用于科学图像分析的深度学习
J Appl Crystallogr. 2024 Mar 21;57(Pt 2):392-402. doi: 10.1107/S1600576724001390. eCollection 2024 Apr 1.
2
MLExchange: A web-based platform enabling exchangeable machine learning workflows for scientific studies.MLExchange:一个基于网络的平台,可实现用于科学研究的可交换机器学习工作流程。
Annu Workshop Extrem Scale Exp Loop Comput. 2022 Nov;2022:10-15. doi: 10.1109/xloop56614.2022.00007.
3
Deploying Machine Learning Based Segmentation for Scientific Imaging Analysis at Synchrotron Facilities.
在同步加速器设施中部署基于机器学习的分割技术用于科学成像分析。
IS&T Int Symp Electron Imaging. 2023;35. doi: 10.2352/ei.2023.35.9.ipas-290.
4
Physics Discovery in Nanoplasmonic Systems via Autonomous Experiments in Scanning Transmission Electron Microscopy.基于扫描透射电子显微镜自主实验的纳米等离子体系统中的物理发现。
Adv Sci (Weinh). 2022 Dec;9(36):e2203422. doi: 10.1002/advs.202203422. Epub 2022 Nov 7.
5
A comparison of deep-learning-based inpainting techniques for experimental X-ray scattering.基于深度学习的实验X射线散射修复技术比较
J Appl Crystallogr. 2022 Sep 28;55(Pt 5):1277-1288. doi: 10.1107/S1600576722007105. eCollection 2022 Oct 1.
6
A workflow for segmenting soil and plant X-ray computed tomography images with deep learning in Google's Colaboratory.一种在谷歌Colaboratory中使用深度学习对土壤和植物X射线计算机断层扫描图像进行分割的工作流程。
Front Plant Sci. 2022 Sep 13;13:893140. doi: 10.3389/fpls.2022.893140. eCollection 2022.
7
Automated and Autonomous Experiments in Electron and Scanning Probe Microscopy.电子与扫描探针显微镜中的自动化与自主实验。
ACS Nano. 2021 Aug 24;15(8):12604-12627. doi: 10.1021/acsnano.1c02104. Epub 2021 Jul 16.
8
Interactive Visual Study of Multiple Attributes Learning Model of X-Ray Scattering Images.X 射线散射图像多属性学习模型的交互式可视化研究。
IEEE Trans Vis Comput Graph. 2021 Feb;27(2):1312-1321. doi: 10.1109/TVCG.2020.3030384. Epub 2021 Jan 28.
9
Snorkel: rapid training data creation with weak supervision.Snorkel:通过弱监督快速创建训练数据。
VLDB J. 2020;29(2):709-730. doi: 10.1007/s00778-019-00552-1. Epub 2019 Jul 15.
10
DetEdit: A graphical user interface for annotating and editing events detected in long-term acoustic monitoring data.DetEdit:一个用于注释和编辑长期声学监测数据中检测到的事件的图形用户界面。
PLoS Comput Biol. 2020 Jan 13;16(1):e1007598. doi: 10.1371/journal.pcbi.1007598. eCollection 2020 Jan.