• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

结合多属性信息的近红外光谱相似性度量方法

Similarity measure method of near-infrared spectrum combined with multi-attribute information.

作者信息

Zhang Jinfeng, Qin Yuhua, Tian Rongkun, Bai Xiaoli, Liu Jing

机构信息

College of Information Science and Technology, Qingdao University of Science and Technology, China.

College of Information Science and Technology, Qingdao University of Science and Technology, China.

出版信息

Spectrochim Acta A Mol Biomol Spectrosc. 2024 Dec 5;322:124783. doi: 10.1016/j.saa.2024.124783. Epub 2024 Jul 4.

DOI:10.1016/j.saa.2024.124783
PMID:38972098
Abstract

Due to the high-dimensionality, redundancy, and non-linearity of the near-infrared (NIR) spectra data, as well as the influence of attributes such as producing area and grade of the sample, which can all affect the similarity measure between samples. This paper proposed a t-distributed stochastic neighbor embedding algorithm based on Sinkhorn distance (St-SNE) combined with multi-attribute data information. Firstly, the Sinkhorn distance was introduced which can solve problems such as KL divergence asymmetry and sparse data distribution in high-dimensional space, thereby constructing probability distributions that make low-dimensional space similar to high-dimensional space. In addition, to address the impact of multi-attribute features of samples on similarity measure, a multi-attribute distance matrix was constructed using information entropy, and then combined with the numerical matrix of spectral data to obtain a mixed data matrix. In order to validate the effectiveness of the St-SNE algorithm, dimensionality reduction projection was performed on NIR spectral data and compared with PCA, LPP, and t-SNE algorithms. The results demonstrated that the St-SNE algorithm effectively distinguishes samples with different attribute information, and produced more distinct projection boundaries of sample category in low-dimensional space. Then we tested the classification performance of St-SNE for different attributes by using the tobacco and mango datasets, and compared it with LPP, t-SNE, UMAP, and Fisher t-SNE algorithms. The results showed that St-SNE algorithm had the highest classification accuracy for different attributes. Finally, we compared the results of searching the most similar sample with the target tobacco for cigarette formulas, and experiments showed that the St-SNE had the highest consistency with the recommendation of the experts than that of the other algorithms. It can provide strong support for the maintenance and design of the product formula.

摘要

由于近红外(NIR)光谱数据具有高维度、冗余性和非线性,以及样品产地和等级等属性的影响,这些都会影响样品之间的相似性度量。本文提出了一种基于Sinkhorn距离的t分布随机邻域嵌入算法(St-SNE),并结合多属性数据信息。首先,引入Sinkhorn距离,它可以解决诸如KL散度不对称和高维空间中稀疏数据分布等问题,从而构建使低维空间与高维空间相似的概率分布。此外,为了解决样品多属性特征对相似性度量的影响,利用信息熵构建多属性距离矩阵,然后与光谱数据的数值矩阵相结合,得到混合数据矩阵。为了验证St-SNE算法的有效性,对近红外光谱数据进行降维投影,并与主成分分析(PCA)、局部保留投影(LPP)和t-SNE算法进行比较。结果表明,St-SNE算法能够有效地区分具有不同属性信息的样品,并在低维空间中产生更明显的样品类别投影边界。然后,我们使用烟草和芒果数据集测试了St-SNE对不同属性的分类性能,并与LPP、t-SNE、均匀流形近似与投影(UMAP)和Fisher t-SNE算法进行比较。结果表明,St-SNE算法对不同属性具有最高的分类准确率。最后,我们比较了在卷烟配方中搜索与目标烟草最相似样品的结果,实验表明,与其他算法相比,St-SNE与专家推荐的一致性最高。它可以为产品配方的维护和设计提供有力支持。

相似文献

1
Similarity measure method of near-infrared spectrum combined with multi-attribute information.结合多属性信息的近红外光谱相似性度量方法
Spectrochim Acta A Mol Biomol Spectrosc. 2024 Dec 5;322:124783. doi: 10.1016/j.saa.2024.124783. Epub 2024 Jul 4.
2
A t-SNE Based Classification Approach to Compositional Microbiome Data.一种基于t-SNE的微生物群落组成数据分类方法。
Front Genet. 2020 Dec 14;11:620143. doi: 10.3389/fgene.2020.620143. eCollection 2020.
3
[Automatic clustering method of flow cytometry data based on -distributed stochastic neighbor embedding].基于分布式随机邻域嵌入的流式细胞术数据自动聚类方法
Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2018 Oct 25;35(5):697-704. doi: 10.7507/1001-5515.201802037.
4
A cross entropy test allows quantitative statistical comparison of t-SNE and UMAP representations.交叉熵测试允许对 t-SNE 和 UMAP 表示进行定量统计比较。
Cell Rep Methods. 2023 Jan 13;3(1):100390. doi: 10.1016/j.crmeth.2022.100390. eCollection 2023 Jan 23.
5
Dimensionality reduction by UMAP reinforces sample heterogeneity analysis in bulk transcriptomic data.UMAP 通过降维增强了批量转录组数据中样本异质性分析。
Cell Rep. 2021 Jul 27;36(4):109442. doi: 10.1016/j.celrep.2021.109442.
6
Dimensionality reduction and visualisation of hyperspectral ink data using t-SNE.使用 t-SNE 对高光谱墨水数据进行降维和可视化。
Forensic Sci Int. 2020 Jun;311:110194. doi: 10.1016/j.forsciint.2020.110194. Epub 2020 Feb 12.
7
Fuzzy Information Discrimination Measures and Their Application to Low Dimensional Embedding Construction in the UMAP Algorithm.模糊信息判别度量及其在UMAP算法中低维嵌入构建中的应用。
J Imaging. 2022 Apr 15;8(4):113. doi: 10.3390/jimaging8040113.
8
Evaluation of Distance Metrics and Spatial Autocorrelation in Uniform Manifold Approximation and Projection Applied to Mass Spectrometry Imaging Data.基于均摊近似和投影的距离度量和空间自相关评估及其在质谱成像数据中的应用。
Anal Chem. 2019 May 7;91(9):5706-5714. doi: 10.1021/acs.analchem.8b05827. Epub 2019 Apr 25.
9
Shape-aware stochastic neighbor embedding for robust data visualisations.形状感知随机近邻嵌入的稳健数据可视化。
BMC Bioinformatics. 2022 Nov 14;23(1):477. doi: 10.1186/s12859-022-05028-8.
10
Supervised t-distributed stochastic neighbor embedding for data visualization and classification.用于数据可视化和分类的监督式t分布随机邻域嵌入
INFORMS J Comput. 2021 Spring;33(2):419-835. doi: 10.1287/ijoc.2020.0961. Epub 2020 Sep 10.