Suppr超能文献

多源数据变分自编码器的全局和局部特征解缠:一种通过多源拉曼光谱融合技术诊断IgA肾病的可解释模型。

Disentangled global and local features of multi-source data variational autoencoder: An interpretable model for diagnosing IgAN via multi-source Raman spectral fusion techniques.

作者信息

Shuai Wei, Tian Xuecong, Zuo Enguang, Zhang Xueqin, Lu Chen, Gu Jin, Chen Chen, Lv Xiaoyi, Chen Cheng

机构信息

College of Software, Xinjiang University, Urumqi 830046, China.

College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China.

出版信息

Artif Intell Med. 2025 Feb;160:103053. doi: 10.1016/j.artmed.2024.103053. Epub 2024 Dec 12.

Abstract

A single Raman spectrum reflects limited molecular information. Effective fusion of the Raman spectra of serum and urine source domains helps to obtain richer feature information. However, most of the current studies on immunoglobulin A nephropathy (IgAN) based on Raman spectroscopy are based on small sample data and low signal-to-noise ratio. If a multi-source data fusion strategy is directly adopted, it may even reduce the accuracy of disease diagnosis. To this end, this paper proposes a data enhancement and spectral optimization method based on variational autoencoders to obtain reconstructed Raman spectra with doubled sample size and improved signal-to-noise ratio. In the diagnosis of IgAN in multi-source domain Raman spectra, this paper builds a global and local feature decoupled variational autoencoder (DMSGL-VAE) model based on multi-source data. First, the statistical features after spectral segmentation are extracted, and the latent variables obtained by the variational encoder are decoupled through the decoupling module. The global representation and local representation obtained represent the global shared information and local unique information of the serum and urine source domains, respectively. Then, the cross-source reconstruction loss and decoupling loss are used to constrain the decoupling, and the effectiveness of the decoupling is proved quantitatively and qualitatively. Finally, the features of different source domains were integrated to diagnose IgAN, and the results were analyzed for important features using the SHapley Additive exPlanations algorithm. The experimental results showed that the AUC value of the DMSGL-VAE model for diagnosing IgAN on the test set was as high as 0.9958. The SHAP algorithm was used to further prove that proteins, hydroxybutyrate, and guanine are likely to be common biological fingerprint substances for the diagnosis of IgAN by serum and urine Raman spectroscopy. In summary, the DMSGL-VAE model designed based on Raman spectroscopy in this paper can achieve rapid, non-invasive, and accurate screening of IgAN in terms of classification performance. And interpretable analysis may help doctors further understand IgAN and make more efficient diagnostic measures in the future.

摘要

单一拉曼光谱反映的分子信息有限。血清和尿液源域拉曼光谱的有效融合有助于获取更丰富的特征信息。然而,目前大多数基于拉曼光谱的免疫球蛋白A肾病(IgAN)研究都是基于小样本数据且信噪比低。如果直接采用多源数据融合策略,甚至可能降低疾病诊断的准确性。为此,本文提出一种基于变分自编码器的数据增强和光谱优化方法,以获得样本量翻倍且信噪比提高的重建拉曼光谱。在多源域拉曼光谱的IgAN诊断中,本文基于多源数据构建了全局和局部特征解耦的变分自编码器(DMSGL-VAE)模型。首先,提取光谱分割后的统计特征,并通过解耦模块对变分编码器获得的潜在变量进行解耦。得到的全局表示和局部表示分别代表血清和尿液源域的全局共享信息和局部独特信息。然后,利用跨源重建损失和解耦损失来约束解耦,并从定量和定性两方面证明解耦的有效性。最后,整合不同源域的特征来诊断IgAN,并使用SHapley加法解释算法对结果进行重要特征分析。实验结果表明,DMSGL-VAE模型在测试集上诊断IgAN的AUC值高达0.9958。利用SHAP算法进一步证明,蛋白质、羟基丁酸和鸟嘌呤可能是血清和尿液拉曼光谱诊断IgAN的常见生物指纹物质。综上所述,本文基于拉曼光谱设计的DMSGL-VAE模型在分类性能方面能够实现对IgAN的快速、无创和准确筛查。可解释分析可能有助于医生进一步了解IgAN,并在未来制定更有效的诊断措施。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验