Biurrun Manresa José A, Arguissain Federico G, Medina Redondo David E, Mørch Carsten D, Andersen Ole K
Center for Sensory-Motor Interaction, Dept. of Health Science and Technology, Aalborg University, Aalborg, Denmark.
Departamento de Informática, Universidad Nacional de Entre Ríos, Oro Verde, Entre Ríos, Argentina.
PLoS One. 2015 Aug 10;10(8):e0134127. doi: 10.1371/journal.pone.0134127. eCollection 2015.
The agreement between humans and algorithms on whether an event-related potential (ERP) is present or not and the level of variation in the estimated values of its relevant features are largely unknown. Thus, the aim of this study was to determine the categorical and quantitative agreement between manual and automated methods for single-trial detection and estimation of ERP features. To this end, ERPs were elicited in sixteen healthy volunteers using electrical stimulation at graded intensities below and above the nociceptive withdrawal reflex threshold. Presence/absence of an ERP peak (categorical outcome) and its amplitude and latency (quantitative outcome) in each single-trial were evaluated independently by two human observers and two automated algorithms taken from existing literature. Categorical agreement was assessed using percentage positive and negative agreement and Cohen's κ, whereas quantitative agreement was evaluated using Bland-Altman analysis and the coefficient of variation. Typical values for the categorical agreement between manual and automated methods were derived, as well as reference values for the average and maximum differences that can be expected if one method is used instead of the others. Results showed that the human observers presented the highest categorical and quantitative agreement, and there were significantly large differences between detection and estimation of quantitative features among methods. In conclusion, substantial care should be taken in the selection of the detection/estimation approach, since factors like stimulation intensity and expected number of trials with/without response can play a significant role in the outcome of a study.
关于事件相关电位(ERP)是否存在以及其相关特征估计值的变化程度,人类与算法之间的一致性在很大程度上尚不清楚。因此,本研究的目的是确定手动和自动方法在单次试验检测和ERP特征估计方面的分类和定量一致性。为此,对16名健康志愿者进行电刺激,刺激强度分为低于和高于伤害性退缩反射阈值的等级强度,以诱发ERP。由两名人类观察者和从现有文献中选取的两种自动算法分别独立评估每次单次试验中ERP峰值的有无(分类结果)及其幅度和潜伏期(定量结果)。使用阳性和阴性一致率以及科恩κ系数评估分类一致性,而使用布兰德-奥特曼分析和变异系数评估定量一致性。得出了手动和自动方法之间分类一致性的典型值,以及如果使用一种方法代替其他方法可能预期的平均和最大差异的参考值。结果表明,人类观察者表现出最高的分类和定量一致性,并且各方法在定量特征的检测和估计之间存在显著差异。总之,在选择检测/估计方法时应格外谨慎,因为刺激强度和有/无反应的预期试验次数等因素可能对研究结果产生重大影响。