California State University, Long Beach, CA.
American Academy of Sleep Medicine, Darien, IL.
J Clin Sleep Med. 2014 Apr 15;10(4):447-54. doi: 10.5664/jcsm.3630.
The American Academy of Sleep Medicine (AASM) Inter-scorer Reliability program provides a unique opportunity to compare a large number of scorers with varied levels of experience to determine agreement in the scoring of respiratory events. The objective of this paper is to examine areas of disagreement to inform future revisions of the AASM Manual for the Scoring of Sleep and Associated Events.
The sample included 15 monthly records, 200 epochs each. The number of scorers increased steadily during the period of data collection, reaching more than 3,600 scorers by the final record. Scorers were asked to identify whether an obstructive, mixed, or central apnea; a hypopnea; or no event was seen in each of the 200 epochs. The "correct" respiratory event score was defined as the score endorsed by the most scorers. Percentage agreement with the majority score was determined for each epoch and the mean agreement determined.
The overall agreement for scoring of respiratory events was 93.9% (κ = 0.92). There was very high agreement on epochs without respiratory events (97.4%), and the majority score for most of the epochs (87.8%) was no event. For the 364 epochs scored as having a respiratory event, overall agreement that some type of respiratory event occurred was 88.4% (κ = 0.77). The agreement for epochs scored as obstructive apnea by the majority was 77.1% (κ = 0.71), and the most common disagreement was hypopnea rather than obstructive apnea (14.4%). The agreement for hypopnea was 65.4% (κ = 0.57), with 16.4% scoring no event and 14.8% scoring obstructive apnea. The agreement for central apnea was 52.4% (κ = 0.41). A single epoch was scored as a mixed apnea by a plurality of scorers.
The study demonstrated excellent agreement among a large sample of scorers for epochs with no respiratory events. Agreement for some type of event was good, but disagreements in scoring of apnea vs. hypopnea and type of apnea were common. A limitation of the analysis is that most of the records had normal breathing. A review of controversial events yielded no consistent bias that might be resolved by a change of scoring rules.
美国睡眠医学学会(AASM)评分员间可靠性计划提供了一个独特的机会,可以比较大量具有不同经验水平的评分员,以确定在呼吸事件评分中的一致性。本文的目的是探讨意见分歧的领域,为未来修订《睡眠和相关事件的 AASM 手册》提供信息。
样本包括 15 份月度记录,每份 200 个时相。在数据收集期间,评分员的数量稳步增加,到最后一份记录时达到了 3600 多名评分员。要求评分员识别在 200 个时相中是否存在阻塞性、混合性或中枢性呼吸暂停、呼吸不足或无事件。“正确”的呼吸事件评分定义为得到大多数评分员认可的评分。确定每个时相的多数评分一致性百分比,并确定平均一致性。
呼吸事件评分的总体一致性为 93.9%(κ=0.92)。无呼吸事件时相的一致性非常高(97.4%),并且大多数时相(87.8%)的多数评分是无事件。对于 364 个被评为存在呼吸事件的时相,总体上一致认为发生了某种类型的呼吸事件的比例为 88.4%(κ=0.77)。由多数评分员评为阻塞性呼吸暂停的时相的一致性为 77.1%(κ=0.71),最常见的分歧是呼吸不足而不是阻塞性呼吸暂停(14.4%)。呼吸不足的一致性为 65.4%(κ=0.57),16.4%的评分无事件,14.8%的评分阻塞性呼吸暂停。中枢性呼吸暂停的一致性为 52.4%(κ=0.41)。一个时相被多个评分员评为混合性呼吸暂停。
该研究表明,在大量评分员样本中,无呼吸事件时相的评分具有极好的一致性。对某种类型事件的一致性较好,但在呼吸暂停与呼吸不足以及呼吸暂停类型的评分方面存在分歧。分析的一个局限性是,大多数记录的呼吸正常。对有争议的事件进行审查并未发现一致的偏见,这些偏见可能通过改变评分规则得到解决。