• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于CLIP和Transformer网络的语义增强特征提取用于驾驶员疲劳检测

Semantically-Enhanced Feature Extraction with CLIP and Transformer Networks for Driver Fatigue Detection.

作者信息

Gao Zhen, Chen Xiaowen, Xu Jingning, Yu Rongjie, Zhang Heng, Yang Jinqiu

机构信息

School of Computer Science and Technology, Tongji University, Shanghai 201804, China.

Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China.

出版信息

Sensors (Basel). 2024 Dec 12;24(24):7948. doi: 10.3390/s24247948.

DOI:10.3390/s24247948
PMID:39771685
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11679248/
Abstract

Drowsy driving is a leading cause of commercial vehicle traffic crashes. The trend is to train fatigue detection models using deep neural networks on driver video data, but challenges remain in coarse and incomplete high-level feature extraction and network architecture optimization. This paper pioneers the use of the CLIP (Contrastive Language-Image Pre-training) model for fatigue detection. And by harnessing the power of a Transformer architecture, sophisticated and long-term temporal features are adeptly extracted from video sequences, paving the way for more nuanced and accurate fatigue analysis. The proposed CT-Net (CLIP-Transformer Network) achieves an AUC (Area Under the Curve) of 0.892, a 36% accuracy improvement over the prevalent CNN-LSTM (Convolutional Neural Network-Long Short-Term Memory) end-to-end model, reaching state-of-the-art performance. Experiments show that the CLIP pre-trained model more accurately extracts facial and behavioral features from driver video frames, improving the model's AUC by 7% over the ImageNet-based pre-trained model. Moreover, compared with LSTM, the Transformer more flexibly captures long-term dependencies among temporal features, further enhancing the model's AUC by 4%.

摘要

疲劳驾驶是商用车交通事故的主要原因。目前的趋势是在驾驶员视频数据上使用深度神经网络训练疲劳检测模型,但在粗略和不完整的高级特征提取以及网络架构优化方面仍然存在挑战。本文率先将CLIP(对比语言-图像预训练)模型用于疲劳检测。通过利用Transformer架构的强大功能,从视频序列中巧妙地提取复杂的长期时间特征,为更细致、准确的疲劳分析铺平了道路。所提出的CT-Net(CLIP-Transformer网络)实现了0.892的AUC(曲线下面积),比普遍的CNN-LSTM(卷积神经网络-长短期记忆)端到端模型的准确率提高了36%,达到了当前的最优性能。实验表明,CLIP预训练模型能更准确地从驾驶员视频帧中提取面部和行为特征,比基于ImageNet的预训练模型将模型的AUC提高了7%。此外,与LSTM相比,Transformer能更灵活地捕捉时间特征之间的长期依赖关系,进一步将模型的AUC提高了4%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/f00ced3aa7e2/sensors-24-07948-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/a0883fe959a6/sensors-24-07948-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/8ac4352d8742/sensors-24-07948-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/fbf83277ebeb/sensors-24-07948-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/a97fbe85814c/sensors-24-07948-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/712787751f28/sensors-24-07948-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/f00ced3aa7e2/sensors-24-07948-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/a0883fe959a6/sensors-24-07948-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/8ac4352d8742/sensors-24-07948-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/fbf83277ebeb/sensors-24-07948-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/a97fbe85814c/sensors-24-07948-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/712787751f28/sensors-24-07948-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d269/11679248/f00ced3aa7e2/sensors-24-07948-g006.jpg

相似文献

1
Semantically-Enhanced Feature Extraction with CLIP and Transformer Networks for Driver Fatigue Detection.基于CLIP和Transformer网络的语义增强特征提取用于驾驶员疲劳检测
Sensors (Basel). 2024 Dec 12;24(24):7948. doi: 10.3390/s24247948.
2
Brain tumor segmentation and detection in MRI using convolutional neural networks and VGG16.使用卷积神经网络和VGG16在磁共振成像(MRI)中进行脑肿瘤分割与检测
Cancer Biomark. 2025 Mar;42(3):18758592241311184. doi: 10.1177/18758592241311184. Epub 2025 Apr 4.
3
Real-Time Fatigue Detection Algorithms Using Machine Learning for Yawning and Eye State.使用机器学习的打哈欠和眼部状态实时疲劳检测算法
Sensors (Basel). 2024 Dec 6;24(23):7810. doi: 10.3390/s24237810.
4
Using long short term memory and convolutional neural networks for driver drowsiness detection.使用长短时记忆和卷积神经网络进行驾驶员瞌睡检测。
Accid Anal Prev. 2021 Jun;156:106107. doi: 10.1016/j.aap.2021.106107. Epub 2021 Apr 10.
5
Optimized driver fatigue detection method using multimodal neural networks.基于多模态神经网络的优化驾驶员疲劳检测方法
Sci Rep. 2025 Apr 10;15(1):12240. doi: 10.1038/s41598-025-86709-1.
6
A novel deep-learning model based on τ-shaped convolutional network (τNet) with long short-term memory (LSTM) for physiological fatigue detection from EEG and EOG signals.一种基于 τ 形卷积网络 (τNet) 和长短时记忆 (LSTM) 的新型深度学习模型,用于从 EEG 和 EOG 信号中检测生理疲劳。
Med Biol Eng Comput. 2024 Jun;62(6):1781-1793. doi: 10.1007/s11517-024-03033-y. Epub 2024 Feb 20.
7
End-to-end fatigue driving EEG signal detection model based on improved temporal-graph convolution network.基于改进时间图卷积网络的端到端疲劳驾驶脑电信号检测模型
Comput Biol Med. 2023 Jan;152:106431. doi: 10.1016/j.compbiomed.2022.106431. Epub 2022 Dec 16.
8
Facial Micro-Expression Recognition Enhanced by Score Fusion and a Hybrid Model from Convolutional LSTM and Vision Transformer.基于卷积长短期记忆网络和视觉Transformer融合的评分融合与混合模型增强的面部微表情识别。
Sensors (Basel). 2023 Jun 16;23(12):5650. doi: 10.3390/s23125650.
9
A dense multi-pooling convolutional network for driving fatigue detection.用于驾驶疲劳检测的密集多池化卷积网络。
Sci Rep. 2025 May 3;15(1):15518. doi: 10.1038/s41598-025-99441-7.
10
Detection of Drowsiness among Drivers Using Novel Deep Convolutional Neural Network Model.利用新型深度卷积神经网络模型检测驾驶员瞌睡
Sensors (Basel). 2023 Oct 26;23(21):8741. doi: 10.3390/s23218741.

引用本文的文献

1
Cross-Modal Weakly Supervised RGB-D Salient Object Detection with a Focus on Filamentary Structures.关注丝状结构的跨模态弱监督RGB-D显著目标检测
Sensors (Basel). 2025 May 9;25(10):2990. doi: 10.3390/s25102990.

本文引用的文献

1
Detection of Drowsiness among Drivers Using Novel Deep Convolutional Neural Network Model.利用新型深度卷积神经网络模型检测驾驶员瞌睡
Sensors (Basel). 2023 Oct 26;23(21):8741. doi: 10.3390/s23218741.
2
Using long short term memory and convolutional neural networks for driver drowsiness detection.使用长短时记忆和卷积神经网络进行驾驶员瞌睡检测。
Accid Anal Prev. 2021 Jun;156:106107. doi: 10.1016/j.aap.2021.106107. Epub 2021 Apr 10.
3
Detecting fatigue in car drivers and aircraft pilots by using non-invasive measures: The value of differentiation of sleepiness and mental fatigue.
使用非侵入性措施检测汽车驾驶员和飞机驾驶员的疲劳:区分嗜睡和精神疲劳的价值。
J Safety Res. 2020 Feb;72:173-187. doi: 10.1016/j.jsr.2019.12.015. Epub 2020 Jan 14.
4
Driver Drowsiness Detection Based on Steering Wheel Data Applying Adaptive Neuro-Fuzzy Feature Selection.基于自适应神经模糊特征选择的方向盘数据驾驶员瞌睡检测。
Sensors (Basel). 2019 Feb 22;19(4):943. doi: 10.3390/s19040943.
5
Adapting artificial neural networks to a specific driver enhances detection and prediction of drowsiness.将人工神经网络适应特定驾驶员可提高对瞌睡的检测和预测。
Accid Anal Prev. 2018 Dec;121:118-128. doi: 10.1016/j.aap.2018.08.017. Epub 2018 Sep 20.
6
Can variations in visual behavior measures be good predictors of driver sleepiness? A real driving test study.视觉行为测量的变化能否成为驾驶员困倦的良好预测指标?一项实际驾驶测试研究。
Traffic Inj Prev. 2017 Feb 17;18(2):132-138. doi: 10.1080/15389588.2016.1203425. Epub 2016 Oct 20.
7
Validation of the Karolinska sleepiness scale against performance and EEG variables.卡罗林斯卡嗜睡量表与行为表现及脑电图变量的效度验证。
Clin Neurophysiol. 2006 Jul;117(7):1574-81. doi: 10.1016/j.clinph.2006.03.011. Epub 2006 May 6.
8
Estimation of the Youden Index and its associated cutoff point.尤登指数及其相关截断点的估计。
Biom J. 2005 Aug;47(4):458-72. doi: 10.1002/bimj.200410135.