Suppr超能文献

通过拉曼光谱的随机剪接辅助深度学习用于乳腺癌细胞系分类

Random splicing assisted deep learning for breast cancer cell line classification via Raman spectroscopy.

作者信息

Liu Yiheng, Liu Junfeng, Wan Jiayi, Hao Hongke, Liu Guangxing, Huang Xia

机构信息

Department of Biosciences and Bioinformatics, Xi'an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, PR China.

Innovation Academy for Precision Measurement Science and Technology, Chinese Academy of Sciences, Wuhan 430071, PR China.

出版信息

Comput Struct Biotechnol J. 2025 May 30;27:2288-2297. doi: 10.1016/j.csbj.2025.05.051. eCollection 2025.

Abstract

Raman spectroscopy extracts rich biochemical information on a single cell, demonstrating significant potential for precise cancer identification. While machine learning enhances spectral analysis efficiency, conventional models remain constrained by data volume. Here, we developed Random Splicing-Convolutional Neural Network (RS-CNN), a deep learning framework that addresses data scarcity through spectral concatenation. By randomly splicing Raman spectra from the same cell line, RS-CNN enhances distinctive spectral features while simultaneously expanding dataset size and improving signal quality. Validation across six breast cancer cell lines demonstrated RS-CNN's superiority over five benchmark models (SVM, LDA, PCA-SVM, PCA-LDA, CNN). With 450 spectra per cell line, RS-CNN achieved 98.63 % classification accuracy compared to conventional models' accuracies of around 85 %. Under data-limited conditions (100 spectra/line), RS-CNN maintained 91.47 % accuracy, outperforming CNN's 70.83 %. The RS-CNN's generalizability was further validated by an independently acquired dataset, achieving at least 94 % classification accuracy. SHAP analysis suggested the spectral region around 980 cm⁻¹ was significant for cancer diagnosis, while the 1158-1160 cm⁻¹and 1603-1607 cm⁻¹ regions were particularly valuable for distinguishing between cancer subtypes. These findings establish RS-CNN as a robust analytical model for clinical Raman diagnostics, particularly valuable in applications requiring high accuracy with limited samples.

摘要

拉曼光谱能够提取单个细胞丰富的生化信息,在精确癌症识别方面展现出巨大潜力。虽然机器学习提高了光谱分析效率,但传统模型仍受数据量的限制。在此,我们开发了随机拼接卷积神经网络(RS-CNN),这是一种深度学习框架,通过光谱拼接解决数据稀缺问题。通过随机拼接来自同一细胞系的拉曼光谱,RS-CNN增强了独特的光谱特征,同时扩大了数据集规模并提高了信号质量。对六种乳腺癌细胞系的验证表明,RS-CNN优于五个基准模型(支持向量机、线性判别分析、主成分分析-支持向量机、主成分分析-线性判别分析、卷积神经网络)。每个细胞系有450个光谱时,RS-CNN的分类准确率达到98.63%,而传统模型的准确率约为85%。在数据有限的条件下(每个细胞系100个光谱),RS-CNN保持了91.47%的准确率,优于卷积神经网络的70.83%。RS-CNN的通用性通过一个独立获取的数据集进一步得到验证,分类准确率至少达到94%。SHAP分析表明,980 cm⁻¹附近的光谱区域对癌症诊断具有重要意义,而1158 - 1160 cm⁻¹和1603 - 1607 cm⁻¹区域对于区分癌症亚型特别有价值。这些发现确立了RS-CNN作为临床拉曼诊断的强大分析模型,在需要对有限样本进行高精度分析的应用中尤其有价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae04/12162052/99944c6130b7/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验