Sun Shengzi, Guo Binghui, Mi Zhilong, Zheng Zhiming
Beijing Advanced Innovation Center for Big Data and Brain Computing and NLSDE, Beihang University, Beijing, 100191, China.
Peng Cheng Laboratory, Shenzhen, 518055, Guangdong Province, China.
Sci Rep. 2021 Oct 13;11(1):20319. doi: 10.1038/s41598-021-92750-7.
Cross-modal retrieval has become a topic of popularity, since multi-data is heterogeneous and the similarities between different forms of information are worthy of attention. Traditional single-modal methods reconstruct the original information and lack of considering the semantic similarity between different data. In this work, a cross-modal semantic autoencoder with embedding consensus (CSAEC) is proposed, mapping the original data to a low-dimensional shared space to retain semantic information. Considering the similarity between the modalities, an automatic encoder is utilized to associate the feature projection to the semantic code vector. In addition, regularization and sparse constraints are applied to low-dimensional matrices to balance reconstruction errors. The high dimensional data is transformed into semantic code vector. Different models are constrained by parameters to achieve denoising. The experiments on four multi-modal data sets show that the query results are improved and effective cross-modal retrieval is achieved. Further, CSAEC can also be applied to fields related to computer and network such as deep and subspace learning. The model breaks through the obstacles in traditional methods, using deep learning methods innovatively to convert multi-modal data into abstract expression, which can get better accuracy and achieve better results in recognition.
跨模态检索已成为一个热门话题,因为多数据具有异构性,不同形式信息之间的相似性值得关注。传统的单模态方法会重建原始信息,且缺乏对不同数据之间语义相似性的考虑。在这项工作中,提出了一种具有嵌入一致性的跨模态语义自动编码器(CSAEC),将原始数据映射到低维共享空间以保留语义信息。考虑到模态之间的相似性,利用自动编码器将特征投影与语义代码向量相关联。此外,对低维矩阵应用正则化和稀疏约束以平衡重建误差。高维数据被转换为语义代码向量。不同模型通过参数进行约束以实现去噪。在四个多模态数据集上的实验表明,查询结果得到了改善,实现了有效的跨模态检索。此外,CSAEC还可应用于计算机和网络相关领域,如深度学习和子空间学习。该模型突破了传统方法中的障碍,创新性地使用深度学习方法将多模态数据转换为抽象表达,在识别中能够获得更好的准确性并取得更好的结果。