Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, USA.
BMC Bioinformatics. 2012 Aug 8;13:195. doi: 10.1186/1471-2105-13-195.
Early classification of time series is beneficial for biomedical informatics problems such including, but not limited to, disease change detection. Early classification can be of tremendous help by identifying the onset of a disease before it has time to fully take hold. In addition, extracting patterns from the original time series helps domain experts to gain insights into the classification results. This problem has been studied recently using time series segments called shapelets. In this paper, we present a method, which we call Multivariate Shapelets Detection (MSD), that allows for early and patient-specific classification of multivariate time series. The method extracts time series patterns, called multivariate shapelets, from all dimensions of the time series that distinctly manifest the target class locally. The time series were classified by searching for the earliest closest patterns.
The proposed early classification method for multivariate time series has been evaluated on eight gene expression datasets from viral infection and drug response studies in humans. In our experiments, the MSD method outperformed the baseline methods, achieving highly accurate classification by using as little as 40%-64% of the time series. The obtained results provide evidence that using conventional classification methods on short time series is not as accurate as using the proposed methods specialized for early classification.
For the early classification task, we proposed a method called Multivariate Shapelets Detection (MSD), which extracts patterns from all dimensions of the time series. We showed that the MSD method can classify the time series early by using as little as 40%-64% of the time series' length.
时间序列的早期分类有益于生物医学信息学问题,包括但不限于疾病变化检测。早期分类可以通过在疾病完全发作之前识别其发作来提供巨大帮助。此外,从原始时间序列中提取模式可以帮助领域专家深入了解分类结果。最近,人们使用称为形状特征的时间序列片段来研究这个问题。在本文中,我们提出了一种称为多变量形状特征检测(Multivariate Shapelets Detection,MSD)的方法,该方法允许对多变量时间序列进行早期和患者特异性分类。该方法从时间序列的所有维度中提取时间序列模式,称为多变量形状特征,这些模式在局部明显表现出目标类。通过搜索最早的最接近的模式来对时间序列进行分类。
我们提出的用于多变量时间序列的早期分类方法已经在来自人类病毒感染和药物反应研究的八个基因表达数据集上进行了评估。在我们的实验中,MSD 方法优于基线方法,通过使用时间序列的 40%-64%的长度即可实现高度准确的分类。所得到的结果表明,在短时间序列上使用常规分类方法不如使用专门针对早期分类的建议方法准确。
对于早期分类任务,我们提出了一种称为多变量形状特征检测(Multivariate Shapelets Detection,MSD)的方法,该方法从时间序列的所有维度中提取模式。我们表明,MSD 方法可以通过使用时间序列长度的 40%-64%来尽早对时间序列进行分类。