Department of Artificial Intelligence, Kongju National University, Cheonan 31080, Korea.
School of Computer Science, University of Seoul, Seoul 02504, Korea.
Sensors (Basel). 2022 Jun 14;22(12):4483. doi: 10.3390/s22124483.
We propose a method, called bi-point input, for convolutional neural networks (CNNs) that handle variable-length input features (e.g., speech utterances). Feeding input features into a CNN in a mini-batch unit requires that all features in each mini-batch have the same shape. A set of variable-length features cannot be directly fed into a CNN because they commonly have different lengths. Feature segmentation is a dominant method for CNNs to handle variable-length features, where each feature is decomposed into fixed-length segments. A CNN receives one segment as an input at one time. However, a CNN can consider only the information of one segment at one time, not the entire feature. This drawback limits the amount of information available at one time and consequently results in suboptimal solutions. Our proposed method alleviates this problem by increasing the amount of information available at one time. With the proposed method, a CNN receives a pair of two segments obtained from a feature as an input at one time. Each of the two segments generally covers different time ranges and therefore has different information. We also propose various combination methods and provide a rough guidance to set a proper segment length without evaluation. We evaluate the proposed method on the spoofing detection tasks using the ASVspoof 2019 database under various conditions. The experimental results reveal that the proposed method reduces the relative equal error rate (EER) by approximately 17.2% and 43.8% on average for the logical access (LA) and physical access (PA) tasks, respectively.
我们提出了一种方法,称为双点输入,用于处理可变长度输入特征(例如,语音话语)的卷积神经网络(CNN)。在批量单元中将输入特征馈送到 CNN 中需要使每个批量中的所有特征具有相同的形状。由于它们通常具有不同的长度,因此一组可变长度特征不能直接馈送到 CNN 中。特征分割是 CNN 处理可变长度特征的主要方法,其中每个特征分解为固定长度的段。CNN 一次接收一个段作为输入。但是,CNN 一次只能考虑一个段的信息,而不是整个特征。这个缺点限制了一次可用的信息量,从而导致次优的解决方案。我们提出的方法通过增加一次可用的信息量来缓解这个问题。在我们的方法中,CNN 一次接收来自特征的两个段对作为输入。这两个段通常覆盖不同的时间范围,因此具有不同的信息。我们还提出了各种组合方法,并提供了一个大致的指导,无需评估即可设置适当的段长。我们在各种条件下使用 ASVspoof 2019 数据库的欺骗检测任务上评估了所提出的方法。实验结果表明,所提出的方法分别平均将逻辑访问(LA)和物理访问(PA)任务的相对等错误率(EER)降低了约 17.2%和 43.8%。