IEEE Trans Biomed Eng. 2021 Jun;68(6):1978-1989. doi: 10.1109/TBME.2020.3045720. Epub 2021 May 21.
When training machine learning models, we often assume that the training data and evaluation data are sampled from the same distribution. However, this assumption is violated when the model is evaluated on another unseen but similar database, even if that database contains the same classes. This problem is caused by domain-shift and can be solved using two approaches: domain adaptation and domain generalization. Simply, domain adaptation methods can access data from unseen domains during training; whereas in domain generalization, the unseen data is not available during training. Hence, domain generalization concerns models that perform well on inaccessible, domain-shifted data.
Our proposed domain generalization method represents an unseen domain using a set of known basis domains, afterwhich we classify the unseen domain using classifier fusion. To demonstrate our system, we employ a collection of heart sound databases that contain normal and abnormal sounds (classes).
Our proposed classifier fusion method achieves accuracy gains of up to 16% for four completely unseen domains.
Recognizing the complexity induced by the inherent temporal nature of biosignal data, the two-stage method proposed in this study is able to effectively simplify the whole process of domain generalization while demonstrating good results on unseen domains and the adopted basis domains.
To our best knowledge, this is the first study that investigates domain generalization for biosignal data. Our proposed learning strategy can be used to effectively learn domain-relevant features while being aware of the class differences in the data.
在训练机器学习模型时,我们通常假设训练数据和评估数据是从同一分布中采样的。然而,当模型在另一个未见过但相似的数据库上进行评估时,即使该数据库包含相同的类别,这种假设也会被违反。这个问题是由领域转移引起的,可以通过两种方法来解决:领域自适应和领域泛化。简单地说,领域自适应方法可以在训练期间访问未见领域的数据;而在领域泛化中,训练期间不可用未见数据。因此,领域泛化关注的是在不可访问的、领域转移的数据上表现良好的模型。
我们提出的领域泛化方法使用一组已知的基础领域来表示未见领域,然后使用分类器融合对未见领域进行分类。为了演示我们的系统,我们使用了一组包含正常和异常声音(类别)的心脏声音数据库。
我们提出的分类器融合方法在四个完全未见的领域中实现了高达 16%的准确率增益。
认识到生物信号数据固有时间性质所带来的复杂性,本研究提出的两阶段方法能够有效地简化领域泛化的整个过程,同时在未见领域和采用的基础领域中取得良好的结果。
据我们所知,这是首次研究生物信号数据的领域泛化。我们提出的学习策略可以有效地学习与领域相关的特征,同时意识到数据中的类别差异。