Choo Bryan Peide, Mok Yingjuan, Oh Hong Choon, Patanaik Amiya, Kishan Kishan, Awasthi Animesh, Biju Siddharth, Bhattacharjee Soumya, Poh Yvonne, Wong Hang Siang
Health Services Research, Changi General Hospital, Singapore, Singapore.
Department of Respiratory and Critical Care Medicine, Changi General Hospital, Singapore, Singapore.
Front Neurol. 2023 Feb 17;14:1123935. doi: 10.3389/fneur.2023.1123935. eCollection 2023.
The current gold standard for measuring sleep disorders is polysomnography (PSG), which is manually scored by a sleep technologist. Scoring a PSG is time-consuming and tedious, with substantial inter-rater variability. A deep-learning-based sleep analysis software module can perform autoscoring of PSG. The primary objective of the study is to validate the accuracy and reliability of the autoscoring software. The secondary objective is to measure workflow improvements in terms of time and cost a time motion study.
The performance of an automatic PSG scoring software was benchmarked against the performance of two independent sleep technologists on PSG data collected from patients with suspected sleep disorders. The technologists at the hospital clinic and a third-party scoring company scored the PSG records independently. The scores were then compared between the technologists and the automatic scoring system. An observational study was also performed where the time taken for sleep technologists at the hospital clinic to manually score PSGs was tracked, along with the time taken by the automatic scoring software to assess for potential time savings.
Pearson's correlation between the manually scored apnea-hypopnea index (AHI) and the automatically scored AHI was 0.962, demonstrating a near-perfect agreement. The autoscoring system demonstrated similar results in sleep staging. The agreement between automatic staging and manual scoring was higher in terms of accuracy and Cohen's kappa than the agreement between experts. The autoscoring system took an average of 42.7 s to score each record compared with 4,243 s for manual scoring. Following a manual review of the auto scores, an average time savings of 38.6 min per PSG was observed, amounting to 0.25 full-time equivalent (FTE) savings per year.
The findings indicate a potential for a reduction in the burden of manual scoring of PSGs by sleep technologists and may be of operational significance for sleep laboratories in the healthcare setting.
目前用于测量睡眠障碍的金标准是多导睡眠图(PSG),由睡眠技术人员进行人工评分。对PSG进行评分既耗时又繁琐,评分者之间存在很大差异。基于深度学习的睡眠分析软件模块可以对PSG进行自动评分。本研究的主要目的是验证自动评分软件的准确性和可靠性。次要目的是通过时间动作研究来衡量在时间和成本方面工作流程的改进情况。
将自动PSG评分软件的性能与两名独立睡眠技术人员对从疑似睡眠障碍患者收集的PSG数据的评分性能进行基准对比。医院诊所的技术人员和第三方评分公司分别独立对PSG记录进行评分。然后将技术人员的评分与自动评分系统的评分进行比较。还进行了一项观察性研究,记录医院诊所睡眠技术人员手动评分PSG所需的时间,以及自动评分软件评估潜在时间节省情况所需的时间。
人工评分的呼吸暂停低通气指数(AHI)与自动评分的AHI之间的Pearson相关性为0.962,显示出近乎完美的一致性。自动评分系统在睡眠分期方面也得出了类似结果。自动分期与人工评分之间在准确性和科恩kappa系数方面的一致性高于专家之间的一致性。自动评分系统平均每条记录评分耗时42.7秒,而人工评分为4243秒。在对自动评分进行人工复查后,观察到每个PSG平均节省时间38.6分钟,相当于每年节省0.25个全时当量(FTE)。
研究结果表明睡眠技术人员手动评分PSG的负担有可能减轻,这对医疗环境中的睡眠实验室可能具有实际意义。