Crowell D H, Brooks L J, Colton T, Corwin M J, Hoppenbrouwers T T, Hunt C E, Kapuniai L E, Lister G, Neuman M R, Peucker M, Ward S L, Weese-Mayer D E, Willinger M
Kapiolani Medical Center for Women and Children, Honolulu, Hawaii 96826, USA.
Sleep. 1997 Jul;20(7):553-60.
Infant polysomnography (IPSG) is an increasingly important procedure for studying infants with sleep and breathing disorders. Since analyses of these IPSG data are subjective, an equally important issue is the reliability or strength of agreement among scorers (especially among experienced clinicians) of sleep parameters (SP) and sleep states (SS). One basic issue of this problem was examined by proposing and testing the hypothesis that infant SP and SS ratings can be reliably scored at substantial levels of agreement, that is, kappa (kappa) > or = 0.61. In light of the importance of IPSG reliability in the collaborative home infant monitoring evaluation (CHIME) study, a reliability training and evaluation process was developed and implemented. The bases for training on SP and SS scoring were CHIME criteria that were modifications and supplements to Anders, Emde, and Parmelee (10). The kappa statistic was adopted as the method for evaluating reliability between and among scorers. Scorers were three experienced investigators and four trainees. Inter- and intrarater reliabilities for SP codes and SSs were calculated for 408 randomly selected 30-second epochs of nocturnal IPSG recorded at five CHIME clinical sites from healthy full term (n = 5), preterm (n = 4), apnea of infancy (n = 2), and siblings of the sudden infant death syndrome (SIDS) (n = 4) enrolled subjects. Infant PSG data set 1 was scored by both experienced investigators and trained scorers and was used to assess initial interrater reliability. Infant PSG data set 2 was scored twice by the trained scorers and was used to reassess inter-rater reliability and to assess intrarater reliability. The kappa s for SS ranged from 0.45 to 0.58 for data set 1 and represented a moderate level of agreement. Therefore, rater disagreements were reviewed, and the scoring criteria were modified to clarify ambiguities. The kappa s and confidence intervals (CIs) computed for data set 2 yielded substantial inter-rater and intrarater agreements for the four trained scorers; for SS, the kappa = 0.68 and for SP the kappa s ranged from 0.62 to 0.76. Acceptance of the hypothesis supports the conclusion that the IPSG is a reliable source of clinical and research data when supported by significant kappa s and CIs. Reliability can be maximized with strictly detailed scoring guidelines and training.
婴儿多导睡眠图(IPSG)对于研究患有睡眠和呼吸障碍的婴儿而言,是一项日益重要的检查。由于对这些IPSG数据的分析具有主观性,一个同样重要的问题是睡眠参数(SP)和睡眠状态(SS)评分者(尤其是经验丰富的临床医生)之间的一致性可靠性或强度。通过提出并检验以下假设来研究这个问题的一个基本要点:婴儿SP和SS评级能够在较高一致性水平上得到可靠评分,即kappa(κ)≥0.61。鉴于IPSG可靠性在家庭婴儿协作监测评估(CHIME)研究中的重要性,开发并实施了一个可靠性培训和评估过程。SP和SS评分的培训依据是CHIME标准,该标准是对Anders、Emde和Parmelee(10)标准的修改和补充。采用kappa统计量作为评估评分者之间及评分者内部可靠性的方法。评分者包括三名经验丰富的研究人员和四名学员。从五个CHIME临床站点记录的夜间IPSG中随机选取408个30秒时段,对健康足月(n = 5)、早产(n = 4)、婴儿期呼吸暂停(n = 2)以及婴儿猝死综合征(SIDS)同胞(n = 4)的入组受试者计算SP编码和SS的评分者间及评分者内可靠性。婴儿PSG数据集1由经验丰富的研究人员和经过培训的评分者进行评分,并用于评估初始评分者间可靠性。婴儿PSG数据集2由经过培训的评分者进行两次评分,并用于重新评估评分者间可靠性以及评估评分者内可靠性。数据集1的SS的κ值范围为0.45至0.58,代表中等程度的一致性。因此,对评分者之间的分歧进行了审查,并修改了评分标准以澄清模糊之处。为数据集2计算的κ值和置信区间(CI)显示,四名经过培训的评分者在评分者间和评分者内具有高度一致性;对于SS,κ = 0.68,对于SP,κ值范围为0.62至0.76。该假设的成立支持了以下结论:当得到显著的κ值和CI支持时,IPSG是临床和研究数据的可靠来源。通过严格详细的评分指南和培训可使可靠性最大化。