Sun Chenxi, Li Hongyan, Song Moxian, Hong Shenda
IEEE Trans Neural Netw Learn Syst. 2024 Aug;35(8):11194-11203. doi: 10.1109/TNNLS.2023.3250203. Epub 2024 Aug 5.
Early classification tasks aim to classify time series before observing full data. It is critical in time-sensitive applications such as early sepsis diagnosis in the intensive care unit (ICU). Early diagnosis can provide more opportunities for doctors to rescue lives. However, there are two conflicting goals in the early classification task-accuracy and earliness. Most existing methods try to find a balance between them by weighing one goal against the other. But we argue that a powerful early classifier should always make highly accurate predictions at any moment. The main obstacle is that the key features suitable for classification are not obvious in the early stage, resulting in the excessive overlap of time series distributions in different time stages. The indistinguishable distributions make it difficult for classifiers to recognize. To solve this problem, this article proposes a novel ranking-based cross-entropy (RCE) loss to jointly learn the feature of classes and the order of earliness from time series data. In this way, RCE can help classifier to generate probability distributions of time series in different stages with more distinguishable boundary. Thus, the classification accuracy at each time step is finally improved. Besides, for the applicability of the method, we also accelerate the training process by focusing the learning process on high-ranking samples. Experiments on three real-world datasets show that our method can perform classification more accurately than all baselines at all moments.
早期分类任务旨在在观察到完整数据之前对时间序列进行分类。这在诸如重症监护病房(ICU)中的早期败血症诊断等对时间敏感的应用中至关重要。早期诊断可以为医生挽救生命提供更多机会。然而,早期分类任务存在两个相互冲突的目标——准确性和及时性。大多数现有方法试图通过权衡一个目标与另一个目标来在它们之间找到平衡。但我们认为,一个强大的早期分类器应该在任何时刻都能做出高度准确的预测。主要障碍在于,适合分类的关键特征在早期并不明显,导致不同时间阶段的时间序列分布过度重叠。难以区分的分布使得分类器难以识别。为了解决这个问题,本文提出了一种新颖的基于排序的交叉熵(RCE)损失,以从时间序列数据中联合学习类别的特征和及时性的顺序。通过这种方式,RCE可以帮助分类器生成具有更可区分边界的不同阶段时间序列的概率分布。从而最终提高每个时间步的分类准确率。此外,为了该方法的适用性,我们还通过将学习过程集中在高排名样本上加速训练过程。在三个真实世界数据集上的实验表明,我们的方法在所有时刻都能比所有基线更准确地进行分类。