Yang S, López S, Golmohammadi M, Obeid I, Picone J
Neural Engineering Data Consortium, Temple University, Philadelphia, Pennsylvania, USA, {scott.yang, silvia.lopez, meysam, obeid, picone}@temple.edu.
IEEE Signal Process Med Biol Symp. 2016 Dec;2016. doi: 10.1109/SPMB.2016.7846855. Epub 2017 Feb 9.
To be effective, state of the art machine learning technology needs large amounts of annotated data. There are numerous compelling applications in healthcare that can benefit from high performance automated decision support systems provided by deep learning technology, but they lack the comprehensive data resources required to apply sophisticated machine learning models. Further, for economic reasons, it is very difficult to justify the creation of large annotated corpora for these applications. Hence, automated annotation techniques become increasingly important. In this study, we investigated the effectiveness of using an active learning algorithm to automatically annotate a large EEG corpus. The algorithm is designed to annotate six types of EEG events. Two model training schemes, namely threshold-based and volume-based, are evaluated. In the threshold-based scheme the threshold of confidence scores is optimized in the initial training iteration, whereas for the volume-based scheme only a certain amount of data is preserved after each iteration. Recognition performance is improved 2% absolute and the system is capable of automatically annotating previously unlabeled data. Given that the interpretation of clinical EEG data is an exceedingly difficult task, this study provides some evidence that the proposed method is a viable alternative to expensive manual annotation.
为了发挥效力,先进的机器学习技术需要大量带注释的数据。医疗保健领域有众多引人注目的应用可以受益于深度学习技术提供的高性能自动化决策支持系统,但它们缺乏应用复杂机器学习模型所需的全面数据资源。此外,出于经济原因,为这些应用创建大型带注释语料库很难说得过去。因此,自动注释技术变得越来越重要。在本研究中,我们调查了使用主动学习算法自动注释大型脑电图语料库的有效性。该算法旨在注释六种类型的脑电图事件。评估了两种模型训练方案,即基于阈值的方案和基于量的方案。在基于阈值的方案中,置信度分数的阈值在初始训练迭代中进行优化,而对于基于量的方案,每次迭代后仅保留一定数量的数据。识别性能绝对提高了2%,并且该系统能够自动注释以前未标记的数据。鉴于临床脑电图数据的解读是一项极其困难的任务,本研究提供了一些证据,表明所提出的方法是昂贵的人工注释的可行替代方案。