Svetnik Vladimir, Ma Junshui, Soper Keith A, Doran Scott, Renger John J, Deacon Steve, Koblan Ken S
Merck Research Laboratories, Biometrics Research, Rahway, NJ 07065, USA.
Sleep. 2007 Nov;30(11):1562-74. doi: 10.1093/sleep/30.11.1562.
To evaluate the performance of 2 automated systems, Morpheus and Somnolyzer24X7, with various levels of human review/editing, in scoring polysomnographic (PSG) recordings from a clinical trial using zolpidem in a model of transient insomnia.
164 all-night PSG recordings from 82 subjects collected during 2 nights of sleep, one under placebo and one under zolpidem (10 mg) treatment were used. For each recording, 6 different methods were used to provide sleep stage scores based on Rechtschaffen & Kales criteria: 1) full manual scoring, 2) automated scoring by Morpheus 3) automated scoring by Somnolyzer24X7, 4) automated scoring by Morpheus with full manual review, 5) automated scoring by Morpheus with partial manual review, 6) automated scoring by Somnolyzer24X7 with partial manual review. Ten traditional clinical efficacy measures of sleep initiation, maintenance, and architecture were calculated.
Pair-wise epoch-by-epoch agreements between fully automated and manual scores were in the range of intersite manual scoring agreements reported in the literature (70%-72%). Pair-wise epoch-by-epoch agreements between automated scores manually reviewed were higher (73%-76%). The direction and statistical significance of treatment effect sizes using traditional efficacy endpoints were essentially the same whichever method was used. As the degree of manual review increased, the magnitude of the effect size approached those estimated with fully manual scoring.
Automated or semi-automated sleep PSG scoring offers valuable alternatives to costly, time consuming, and intrasite and intersite variable manual scoring, especially in large multicenter clinical trials. Reduction in scoring variability may also reduce the sample size of a clinical trial.
评估Morpheus和Somnolyzer24X7这两种自动化系统在不同程度人工审核/编辑情况下,对使用唑吡坦治疗短暂性失眠模型的临床试验中的多导睡眠图(PSG)记录进行评分的性能。
使用了82名受试者在两个睡眠夜晚收集的164份全夜PSG记录,一个夜晚为安慰剂对照,另一个夜晚为唑吡坦(10毫克)治疗。对于每份记录,采用6种不同方法根据 Rechtschaffen & Kales标准提供睡眠阶段评分:1)完全人工评分;2)Morpheus自动评分;3)Somnolyzer24X7自动评分;4)Morpheus自动评分并进行完全人工审核;5)Morpheus自动评分并进行部分人工审核;6)Somnolyzer24X7自动评分并进行部分人工审核。计算了10项关于睡眠起始、维持和结构的传统临床疗效指标。
全自动评分与人工评分之间逐时段的两两一致性在文献报道的不同站点人工评分一致性范围内(70%-72%)。经人工审核的自动评分之间逐时段的两两一致性更高(73%-76%)。无论使用哪种方法,使用传统疗效终点的治疗效应大小的方向和统计学意义基本相同。随着人工审核程度的增加,效应大小的幅度接近完全人工评分估计的幅度。
自动化或半自动化睡眠PSG评分提供了有价值的替代方法,可替代成本高、耗时且不同站点间存在差异的人工评分,特别是在大型多中心临床试验中。评分变异性的降低也可能减少临床试验的样本量。