Suppr超能文献

结合多属性信息的近红外光谱相似性度量方法

Similarity measure method of near-infrared spectrum combined with multi-attribute information.

作者信息

Zhang Jinfeng, Qin Yuhua, Tian Rongkun, Bai Xiaoli, Liu Jing

机构信息

College of Information Science and Technology, Qingdao University of Science and Technology, China.

College of Information Science and Technology, Qingdao University of Science and Technology, China.

出版信息

Spectrochim Acta A Mol Biomol Spectrosc. 2024 Dec 5;322:124783. doi: 10.1016/j.saa.2024.124783. Epub 2024 Jul 4.

Abstract

Due to the high-dimensionality, redundancy, and non-linearity of the near-infrared (NIR) spectra data, as well as the influence of attributes such as producing area and grade of the sample, which can all affect the similarity measure between samples. This paper proposed a t-distributed stochastic neighbor embedding algorithm based on Sinkhorn distance (St-SNE) combined with multi-attribute data information. Firstly, the Sinkhorn distance was introduced which can solve problems such as KL divergence asymmetry and sparse data distribution in high-dimensional space, thereby constructing probability distributions that make low-dimensional space similar to high-dimensional space. In addition, to address the impact of multi-attribute features of samples on similarity measure, a multi-attribute distance matrix was constructed using information entropy, and then combined with the numerical matrix of spectral data to obtain a mixed data matrix. In order to validate the effectiveness of the St-SNE algorithm, dimensionality reduction projection was performed on NIR spectral data and compared with PCA, LPP, and t-SNE algorithms. The results demonstrated that the St-SNE algorithm effectively distinguishes samples with different attribute information, and produced more distinct projection boundaries of sample category in low-dimensional space. Then we tested the classification performance of St-SNE for different attributes by using the tobacco and mango datasets, and compared it with LPP, t-SNE, UMAP, and Fisher t-SNE algorithms. The results showed that St-SNE algorithm had the highest classification accuracy for different attributes. Finally, we compared the results of searching the most similar sample with the target tobacco for cigarette formulas, and experiments showed that the St-SNE had the highest consistency with the recommendation of the experts than that of the other algorithms. It can provide strong support for the maintenance and design of the product formula.

摘要

由于近红外(NIR)光谱数据具有高维度、冗余性和非线性,以及样品产地和等级等属性的影响,这些都会影响样品之间的相似性度量。本文提出了一种基于Sinkhorn距离的t分布随机邻域嵌入算法(St-SNE),并结合多属性数据信息。首先,引入Sinkhorn距离,它可以解决诸如KL散度不对称和高维空间中稀疏数据分布等问题,从而构建使低维空间与高维空间相似的概率分布。此外,为了解决样品多属性特征对相似性度量的影响,利用信息熵构建多属性距离矩阵,然后与光谱数据的数值矩阵相结合,得到混合数据矩阵。为了验证St-SNE算法的有效性,对近红外光谱数据进行降维投影,并与主成分分析(PCA)、局部保留投影(LPP)和t-SNE算法进行比较。结果表明,St-SNE算法能够有效地区分具有不同属性信息的样品,并在低维空间中产生更明显的样品类别投影边界。然后,我们使用烟草和芒果数据集测试了St-SNE对不同属性的分类性能,并与LPP、t-SNE、均匀流形近似与投影(UMAP)和Fisher t-SNE算法进行比较。结果表明,St-SNE算法对不同属性具有最高的分类准确率。最后,我们比较了在卷烟配方中搜索与目标烟草最相似样品的结果,实验表明,与其他算法相比,St-SNE与专家推荐的一致性最高。它可以为产品配方的维护和设计提供有力支持。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验