Spring Aaron M, Pittman Daniel J, Aghakhani Yahya, Jirsch Jeffrey, Pillay Neelan, Bello-Espinosa Luis E, Josephson Colin, Federico Paolo
Department of Clinical Neurosciences, University of Calgary, Calgary, AB, Canada.
Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada.
Front Neurol. 2018 Jun 28;9:510. doi: 10.3389/fneur.2018.00510. eCollection 2018.
We examined the interrater reliability and generalizability of high-frequency oscillation (HFO) visual evaluations in the ripple (80-250 Hz) band, and established a framework for the transition of HFO analysis to routine clinical care. We were interested in the interrater reliability or epoch generalizability to describe how similar the evaluations were between reviewers, and in the reviewer generalizability to represent the consistency of the internal threshold each individual reviewer. We studied 41 adult epilepsy patients (mean age: 35.6 years) who underwent intracranial electroencephalography. A morphology detector was designed and used to detect candidate HFO events, lower-threshold events, and distractor events. These events were subsequently presented to six expert reviewers, who visually evaluated events for the presence of HFOs. Generalizability theory was used to characterize the epoch generalizability (interrater reliability) and reviewer generalizability (internal threshold consistency) of visual evaluations, as well as to project the numbers of epochs, reviewers, and datasets required to achieve strong generalizability (threshold of 0.8). The reviewer generalizability was almost perfect (0.983), indicating there were sufficient evaluations to determine the internal threshold of each reviewer. However, the interrater reliability for 6 reviewers (0.588) and pairwise interrater reliability (0.322) were both poor, indicating that the agreement of 6 reviewers is insufficient to reliably establish the presence or absence of individual HFOs. Strong interrater reliability (≥0.8) was projected as requiring a minimum of 17 reviewers, while strong reviewer generalizability could be achieved with <30 epoch evaluations per reviewer. This study reaffirms the poor reliability of using small numbers of reviewers to identify HFOs, and projects the number of reviewers required to overcome this limitation. It also provides a set of tools which may be used for training reviewers, tracking changes to interrater reliability, and for constructing a benchmark set of epochs that can serve as a generalizable gold standard, against which other HFO detection algorithms may be compared. This study represents an important step toward the reconciliation of important but discordant findings from HFO studies undertaken with different sets of HFOs, and ultimately toward transitioning HFO analysis into a meaningful part of the clinical epilepsy workup.
我们研究了高频振荡(HFO)在涟漪频段(80 - 250Hz)视觉评估中的评分者间信度和可推广性,并建立了一个将HFO分析过渡到常规临床护理的框架。我们关注评分者间信度或时段可推广性,以描述不同审阅者之间评估的相似程度,以及审阅者可推广性,以体现每个个体审阅者内部阈值的一致性。我们研究了41例接受颅内脑电图检查的成年癫痫患者(平均年龄:35.6岁)。设计并使用了一种形态检测器来检测候选HFO事件、低阈值事件和干扰事件。这些事件随后被呈现给六位专家审阅者,他们通过视觉评估事件中是否存在HFO。通用izability理论用于表征视觉评估的时段可推广性(评分者间信度)和审阅者可推广性(内部阈值一致性),并预测为实现强可推广性(阈值为0.8)所需的时段、审阅者和数据集数量。审阅者可推广性几乎完美(0.983),表明有足够的评估来确定每个审阅者的内部阈值。然而,6位审阅者的评分者间信度(0.588)和两两评分者间信度(0.322)都很差,这表明6位审阅者的一致性不足以可靠地确定单个HFO的存在与否。预计要达到强评分者间信度(≥0.8)至少需要17位审阅者,而每位审阅者进行少于30次时段评估就能实现强审阅者可推广性。这项研究再次证实了使用少量审阅者识别HFO的可靠性较差,并预测了克服这一局限性所需的审阅者数量。它还提供了一套工具,可用于培训审阅者、跟踪评分者间信度的变化,以及构建一组可作为通用金标准的时段基准集,以便与其他HFO检测算法进行比较。这项研究代表了朝着调和不同HFO研究中重要但不一致的结果迈出的重要一步,并最终朝着将HFO分析转变为临床癫痫检查中有意义的一部分迈出重要一步。