Suppr超能文献

使用大语言模型处理传感器数据。

Using Large Languge Models for Processing Sensor Data.

作者信息

Hojda Maciej

机构信息

Faculty of Information and Communication Technology, Wroclaw University of Science and Technology, Wyb. Wyspiańskiego 27, 50-370 Wrocław, Poland.

出版信息

Sensors (Basel). 2025 Jul 13;25(14):4380. doi: 10.3390/s25144380.

Abstract

The wide availability of sensor data stored in multiple formats makes it difficult to reuse in other applications. We consider the problem of extracting sensor data from unstructured and semi-structured texts using Large Language Models. With careful prompt crafting, we have been able to establish a strict JSON structure which can be further processed with automated ease. We establish a workflow that enables the extraction of data using GPT-4, Llama 3, Mistral and Falcon models, and we show that while the closed-source GPT-4 model is generally leading in conversion efficiency, other open-source models can follow this if given appropriate data structures. We define new measures to simplify the comparison, and we present a multi-purpose workflow for sensor data extraction. We observe that some of the smaller models are incapable of correctly extracting data from freeform text but are skilled in processing tabular data. On the other hand, larger models are more robust and avoid conversion mistakes more easily.

摘要

以多种格式存储的传感器数据广泛可得,这使得其难以在其他应用中复用。我们考虑使用大语言模型从非结构化和半结构化文本中提取传感器数据的问题。通过精心设计提示,我们能够建立一个严格的JSON结构,该结构可以轻松地进行自动化进一步处理。我们建立了一个工作流程,该流程能够使用GPT-4、Llama 3、Mistral和Falcon模型提取数据,并且我们表明,虽然闭源的GPT-4模型在转换效率方面通常领先,但如果给定适当的数据结构,其他开源模型也可以做到这一点。我们定义了新的度量标准以简化比较,并提出了一种用于传感器数据提取的多用途工作流程。我们观察到,一些较小的模型无法从自由格式文本中正确提取数据,但在处理表格数据方面很熟练。另一方面,较大的模型更稳健,更容易避免转换错误。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1320/12299101/92720646e3ef/sensors-25-04380-g0A1.jpg

相似文献

1
Using Large Languge Models for Processing Sensor Data.
Sensors (Basel). 2025 Jul 13;25(14):4380. doi: 10.3390/s25144380.
4
Perceptions and experiences of the prevention, detection, and management of postpartum haemorrhage: a qualitative evidence synthesis.
Cochrane Database Syst Rev. 2023 Nov 27;11(11):CD013795. doi: 10.1002/14651858.CD013795.pub2.
5
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
6
Automated monitoring compared to standard care for the early detection of sepsis in critically ill patients.
Cochrane Database Syst Rev. 2018 Jun 25;6(6):CD012404. doi: 10.1002/14651858.CD012404.pub2.
8
Inhaled mannitol for cystic fibrosis.
Cochrane Database Syst Rev. 2018 Feb 9;2(2):CD008649. doi: 10.1002/14651858.CD008649.pub3.
10

本文引用的文献

2
A critical assessment of using ChatGPT for extracting structured data from clinical notes.
NPJ Digit Med. 2024 May 1;7(1):106. doi: 10.1038/s41746-024-01079-8.
4
Structured information extraction from scientific text with large language models.
Nat Commun. 2024 Feb 15;15(1):1418. doi: 10.1038/s41467-024-45563-x.
5
Using Large Language Models to Enhance the Reusability of Sensor Data.
Sensors (Basel). 2024 Jan 6;24(2):347. doi: 10.3390/s24020347.
6
Application ChatGPT in conducting systematic reviews and meta-analyses.
Br Dent J. 2023 Jul;235(2):90-92. doi: 10.1038/s41415-023-6132-y.
7
The FAIR Guiding Principles for scientific data management and stewardship.
Sci Data. 2016 Mar 15;3:160018. doi: 10.1038/sdata.2016.18.
8
Long short-term memory.
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验