Suppr超能文献

基于深度学习利用微小RNA表达鉴定原发性不明癌的组织来源:算法开发与验证

Deep Learning-Based Identification of Tissue of Origin for Carcinomas of Unknown Primary Using MicroRNA Expression: Algorithm Development and Validation.

作者信息

Raghu Ananya, Raghu Anisha, Wise Jillian F

机构信息

Quarry Lane School, San Ramon, CA, United States.

Department of Biology and Biomedical Sciences, Salve Regina University, Newport, RI, United States.

出版信息

JMIR Bioinform Biotechnol. 2024 Jul 24;5:e56538. doi: 10.2196/56538.

Abstract

BACKGROUND

Carcinoma of unknown primary (CUP) is a subset of metastatic cancers in which the primary tissue source of the cancer cells remains unidentified. CUP is the eighth most common malignancy worldwide, accounting for up to 5% of all malignancies. Representing an exceptionally aggressive metastatic cancer, the median survival is approximately 3 to 6 months. The tissue in which cancer arises plays a key role in our understanding of sensitivities to various forms of cell death. Thus, the lack of knowledge on the tissue of origin (TOO) makes it difficult to devise tailored and effective treatments for patients with CUP. Developing quick and clinically implementable methods to identify the TOO of the primary site is crucial in treating patients with CUP. Noncoding RNAs may hold potential for origin identification and provide a robust route to clinical implementation due to their resistance against chemical degradation.

OBJECTIVE

This study aims to investigate the potential of microRNAs, a subset of noncoding RNAs, as highly accurate biomarkers for detecting the TOO through data-driven, machine learning approaches for metastatic cancers.

METHODS

We used microRNA expression data from The Cancer Genome Atlas data set and assessed various machine learning approaches, from simple classifiers to deep learning approaches. As a test of our classifiers, we evaluated the accuracy on a separate set of 194 primary tumor samples from the Sequence Read Archive. We used permutation feature importance to determine the potential microRNA biomarkers and assessed them with principal component analysis and t-distributed stochastic neighbor embedding visualizations.

RESULTS

Our results show that it is possible to design robust classifiers to detect the TOO for metastatic samples on The Cancer Genome Atlas data set, with an accuracy of up to 97% (351/362), which may be used in situations of CUP. Our findings show that deep learning techniques enhance prediction accuracy. We progressed from an initial accuracy prediction of 62.5% (226/362) with decision trees to 93.2% (337/362) with logistic regression, finally achieving 97% (351/362) accuracy using deep learning on metastatic samples. On the Sequence Read Archive validation set, a lower accuracy of 41.2% (77/188) was achieved by the decision tree, while deep learning achieved a higher accuracy of 80.4% (151/188). Notably, our feature importance analysis showed the top 3 most important features for predicting TOO to be microRNA-10b, microRNA-205, and microRNA-196b, which aligns with previous work.

CONCLUSIONS

Our findings highlight the potential of using machine learning techniques to devise accurate tests for detecting TOO for CUP. Since microRNAs are carried throughout the body via extracellular vesicles secreted from cells, they may serve as key biomarkers for liquid biopsy due to their presence in blood plasma. Our work serves as a foundation toward developing blood-based cancer detection tests based on the presence of microRNA.

摘要

背景

原发灶不明癌(CUP)是转移性癌症的一个子集,其中癌细胞的原发组织来源仍未明确。CUP是全球第八大常见恶性肿瘤,占所有恶性肿瘤的5%。作为一种极具侵袭性的转移性癌症,其中位生存期约为3至6个月。癌症起源的组织在我们理解对各种形式细胞死亡的敏感性方面起着关键作用。因此,对原发部位组织来源(TOO)缺乏了解使得难以针对CUP患者制定量身定制的有效治疗方案。开发快速且可临床应用的方法来识别原发部位的TOO对于治疗CUP患者至关重要。非编码RNA可能具有起源识别的潜力,并且由于其对化学降解的抗性,为临床应用提供了一条可靠途径。

目的

本研究旨在通过数据驱动的机器学习方法,研究作为非编码RNA子集的微小RNA作为检测转移性癌症TOO的高度准确生物标志物的潜力。

方法

我们使用了来自癌症基因组图谱数据集的微小RNA表达数据,并评估了从简单分类器到深度学习方法的各种机器学习方法。作为对我们分类器的测试,我们在来自序列读取存档的另一组194个原发性肿瘤样本上评估了准确性。我们使用排列特征重要性来确定潜在的微小RNA生物标志物,并通过主成分分析和t分布随机邻域嵌入可视化对其进行评估。

结果

我们的结果表明,有可能设计出强大的分类器来检测癌症基因组图谱数据集上转移性样本的TOO,准确率高达97%(351/362),这可用于CUP的情况。我们的研究结果表明深度学习技术提高了预测准确性。我们从最初使用决策树的62.5%(226/362)准确率预测,到使用逻辑回归的93.2%(337/362)准确率,最终在转移性样本上使用深度学习实现了97%(351/362)的准确率。在序列读取存档验证集上,决策树的准确率较低,为41.2%(77/188),而深度学习的准确率较高,为80.4%(151/188)。值得注意的是,我们的特征重要性分析表明,预测TOO最重要的前3个特征是微小RNA-10b、微小RNA-205和微小RNA-196b,这与先前的研究一致。

结论

我们的研究结果突出了使用机器学习技术设计准确测试以检测CUP患者TOO的潜力。由于微小RNA通过细胞分泌的细胞外囊泡在全身循环,它们可能因其存在于血浆中而成为液体活检的关键生物标志物。我们的工作为基于微小RNA存在开发基于血液的癌症检测测试奠定了基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b09b/11306940/6a861c82777d/bioinform_v5i1e56538_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验