• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ClinClip:一种整合脑电图数据的多模态语言预训练模型,用于增强英语医学听力评估。

ClinClip: a Multimodal Language Pre-training model integrating EEG data for enhanced English medical listening assessment.

作者信息

Sun Guangyu

机构信息

The Basic Department, The Tourism College of Changchun University, Changchun, China.

出版信息

Front Neurosci. 2025 Jan 7;18:1493163. doi: 10.3389/fnins.2024.1493163. eCollection 2024.

DOI:10.3389/fnins.2024.1493163
PMID:39850622
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11755411/
Abstract

INTRODUCTION

In the field of medical listening assessments,accurate transcription and effective cognitive load management are critical for enhancing healthcare delivery. Traditional speech recognition systems, while successful in general applications often struggle in medical contexts where the cognitive state of the listener plays a significant role. These conventional methods typically rely on audio-only inputs and lack the ability to account for the listener's cognitive load, leading to reduced accuracy and effectiveness in complex medical environments.

METHODS

To address these limitations, this study introduces ClinClip, a novel multimodal model that integrates EEG signals with audio data through a transformer-based architecture. ClinClip is designed to dynamically adjust to the cognitive state of the listener, thereby improving transcription accuracy and robustness in medical settings. The model leverages cognitive-enhanced strategies, including EEG-based modulation and hierarchical fusion of multimodal data, to overcome the challenges faced by traditional methods.

RESULTS AND DISCUSSION

Experiments conducted on four datasets-EEGEyeNet, DEAP, PhyAAt, and eSports Sensors-demonstrate that ClinClip significantly outperforms six state-of-the-art models in both Word Error Rate (WER) and Cognitive Modulation Efficiency (CME). These results underscore the model's effectiveness in handling complex medical audio scenarios and highlight its potential to improve the accuracy of medical listening assessments. By addressing the cognitive aspects of the listening process. ClinClip contributes to more reliable and effective healthcare delivery, offering a substantial advancement over traditional speech recognition approaches.

摘要

引言

在医学听力评估领域,准确的转录和有效的认知负荷管理对于改善医疗服务至关重要。传统的语音识别系统虽然在一般应用中取得了成功,但在医学环境中往往面临困难,因为听者的认知状态起着重要作用。这些传统方法通常仅依赖音频输入,缺乏考虑听者认知负荷的能力,导致在复杂的医疗环境中准确性和有效性降低。

方法

为了解决这些局限性,本研究引入了ClinClip,这是一种新颖的多模态模型,它通过基于Transformer的架构将脑电图(EEG)信号与音频数据集成在一起。ClinClip旨在动态适应听者的认知状态,从而提高医学环境中的转录准确性和鲁棒性。该模型利用认知增强策略,包括基于EEG的调制和多模态数据的分层融合,以克服传统方法面临的挑战。

结果与讨论

在四个数据集——EEGEyeNet、DEAP、PhyAAt和电子竞技传感器——上进行的实验表明,ClinClip在字错误率(WER)和认知调制效率(CME)方面均显著优于六个最先进的模型。这些结果强调了该模型在处理复杂医学音频场景方面的有效性,并突出了其提高医学听力评估准确性的潜力。通过解决听力过程中的认知方面,ClinClip有助于实现更可靠、更有效的医疗服务,相对于传统语音识别方法有了实质性的进步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/0325ba718cf9/fnins-18-1493163-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/983f4d0e5269/fnins-18-1493163-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/69347e611a76/fnins-18-1493163-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/eeec5f8ab6f4/fnins-18-1493163-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/c25aa1851a9c/fnins-18-1493163-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/a45ff5d961f9/fnins-18-1493163-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/0325ba718cf9/fnins-18-1493163-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/983f4d0e5269/fnins-18-1493163-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/69347e611a76/fnins-18-1493163-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/eeec5f8ab6f4/fnins-18-1493163-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/c25aa1851a9c/fnins-18-1493163-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/a45ff5d961f9/fnins-18-1493163-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2e91/11755411/0325ba718cf9/fnins-18-1493163-g0006.jpg

相似文献

1
ClinClip: a Multimodal Language Pre-training model integrating EEG data for enhanced English medical listening assessment.ClinClip:一种整合脑电图数据的多模态语言预训练模型,用于增强英语医学听力评估。
Front Neurosci. 2025 Jan 7;18:1493163. doi: 10.3389/fnins.2024.1493163. eCollection 2024.
2
EEG-powered cerebral transformer for athletic performance.用于运动表现的脑电图驱动的脑变压器
Front Neurorobot. 2024 Dec 20;18:1499734. doi: 10.3389/fnbot.2024.1499734. eCollection 2024.
3
Brain-inspired multimodal motion and fine-grained action recognition.受脑启发的多模态运动与细粒度动作识别
Front Neurorobot. 2025 Jan 24;18:1502071. doi: 10.3389/fnbot.2024.1502071. eCollection 2024.
4
MMAgentRec, a personalized multi-modal recommendation agent with large language model.MMAgentRec,一个带有大语言模型的个性化多模态推荐代理。
Sci Rep. 2025 Apr 8;15(1):12062. doi: 10.1038/s41598-025-96458-w.
5
Multimodal robot-assisted English writing guidance and error correction with reinforcement learning.基于强化学习的多模态机器人辅助英语写作指导与纠错
Front Neurorobot. 2024 Nov 20;18:1483131. doi: 10.3389/fnbot.2024.1483131. eCollection 2024.
6
PilotCareTrans Net: an EEG data-driven transformer for pilot health monitoring.飞行员护理传输网络:一种用于飞行员健康监测的脑电图数据驱动的变压器。 (注:这里“transformer”在医学专业语境中结合前文看可能是一种特定技术模型等,翻译为“变压器”不太准确,但按要求直接翻译了。)
Front Hum Neurosci. 2025 Jan 29;19:1503228. doi: 10.3389/fnhum.2025.1503228. eCollection 2025.
7
Multimodal Fusion of EEG and Audio Spectrogram for Major Depressive Disorder Recognition Using Modified DenseNet121.基于改进的DenseNet121的脑电图与音频频谱图多模态融合用于重度抑郁症识别
Brain Sci. 2024 Oct 15;14(10):1018. doi: 10.3390/brainsci14101018.
8
A Multimodal Pain Sentiment Analysis System Using Ensembled Deep Learning Approaches for IoT-Enabled Healthcare Framework.一种使用集成深度学习方法的多模态疼痛情感分析系统,用于支持物联网的医疗保健框架。
Sensors (Basel). 2025 Feb 17;25(4):1223. doi: 10.3390/s25041223.
9
Decoding disparities: evaluating automatic speech recognition system performance in transcribing Black and White patient verbal communication with nurses in home healthcare.解码差异:评估自动语音识别系统在转录家庭医疗保健中黑人和白人患者与护士的言语交流方面的性能。
JAMIA Open. 2024 Dec 10;7(4):ooae130. doi: 10.1093/jamiaopen/ooae130. eCollection 2024 Dec.
10
Cross-modality fusion with EEG and text for enhanced emotion detection in English writing.结合脑电图(EEG)与文本的跨模态融合用于增强英语写作中的情感检测
Front Neurorobot. 2025 Jan 17;18:1529880. doi: 10.3389/fnbot.2024.1529880. eCollection 2024.

引用本文的文献

1
Simultaneous interpreting with auto-subtitling: Investigating viewer cognitive effort, stress, and comprehension.同声传译与自动字幕:探究观众的认知努力、压力和理解情况。
PLoS One. 2025 Aug 22;20(8):e0330692. doi: 10.1371/journal.pone.0330692. eCollection 2025.

本文引用的文献

1
Comparisons of air-conduction hearing thresholds between manual and automated methods in a commercial audiometer.商用听力计中手动与自动方法的气导听力阈值比较。
Front Neurosci. 2023 Dec 21;17:1292395. doi: 10.3389/fnins.2023.1292395. eCollection 2023.
2
Real-time Context-Aware Multimodal Network for Activity and Activity-Stage Recognition from Team Communication in Dynamic Clinical Settings.用于动态临床环境中团队沟通的活动及活动阶段识别的实时上下文感知多模态网络
Proc ACM Interact Mob Wearable Ubiquitous Technol. 2023 Mar;7(1). doi: 10.1145/3580798. Epub 2023 Mar 28.
3
The Progress of Speech Recognition in Health Care: Surgery as an Example.
语音识别在医疗保健领域的进展:以手术为例。
Stud Health Technol Inform. 2023 Jun 29;305:414-418. doi: 10.3233/SHTI230519.
4
Few-shot short utterance speaker verification using meta-learning.基于元学习的少样本短语音说话人验证
PeerJ Comput Sci. 2023 Apr 21;9:e1276. doi: 10.7717/peerj-cs.1276. eCollection 2023.
5
Effectiveness of an Over-the-Counter Self-fitting Hearing Aid Compared With an Audiologist-Fitted Hearing Aid: A Randomized Clinical Trial.非处方自助式助听器与听力学家验配助听器的效果比较:一项随机临床试验。
JAMA Otolaryngol Head Neck Surg. 2023 Jun 1;149(6):522-530. doi: 10.1001/jamaoto.2023.0376.
6
IoT-Enabled WBAN and Machine Learning for Speech Emotion Recognition in Patients.物联网支持的 WBAN 和机器学习在患者语音情感识别中的应用。
Sensors (Basel). 2023 Mar 8;23(6):2948. doi: 10.3390/s23062948.
7
The Usefulness of Electronic Health Records From Preventive Youth Healthcare in the Recognition of Child Mental Health Problems.电子健康记录在预防青少年保健中识别儿童心理健康问题的作用。
Front Public Health. 2021 May 31;9:658240. doi: 10.3389/fpubh.2021.658240. eCollection 2021.