Suppr超能文献

一种新型机器学习算法选择蛋白质组特征以特异性识别癌症外泌体。

A novel machine learning algorithm selects proteome signature to specifically identify cancer exosomes.

作者信息

Li Bingrui, Kugeratski Fernanda G, Kalluri Raghu

出版信息

bioRxiv. 2023 Dec 20:2023.07.18.549557. doi: 10.1101/2023.07.18.549557.

Abstract

Non-invasive early cancer diagnosis remains challenging due to the low sensitivity and specificity of current diagnostic approaches. Exosomes are membrane-bound nanovesicles secreted by all cells that contain DNA, RNA, and proteins that are representative of the parent cells. This property, along with the abundance of exosomes in biological fluids makes them compelling candidates as biomarkers. However, a rapid and flexible exosome-based diagnostic method to distinguish human cancers across cancer types in diverse biological fluids is yet to be defined. Here, we describe a novel machine learning-based computational method to distinguish cancers using a panel of proteins associated with exosomes. Employing datasets of exosome proteins from human cell lines, tissue, plasma, serum and urine samples from a variety of cancers, we identify Clathrin Heavy Chain (CLTC), Ezrin, (EZR), Talin-1 (TLN1), Adenylyl cyclase-associated protein 1 (CAP1) and Moesin (MSN) as highly abundant universal biomarkers for exosomes and define three panels of pan-cancer exosome proteins that distinguish cancer exosomes from other exosomes and aid in classifying cancer subtypes employing random forest models. All the models using proteins from plasma, serum, or urine-derived exosomes yield AUROC scores higher than 0.91 and demonstrate superior performance compared to Support Vector Machine, K Nearest Neighbor Classifier and Gaussian Naive Bayes. This study provides a reliable protein biomarker signature associated with cancer exosomes with scalable machine learning capability for a sensitive and specific non-invasive method of cancer diagnosis.

摘要

由于当前诊断方法的敏感性和特异性较低,非侵入性早期癌症诊断仍然具有挑战性。外泌体是所有细胞分泌的膜结合纳米囊泡,含有代表亲代细胞的DNA、RNA和蛋白质。这种特性,以及生物流体中外泌体的丰富性,使其成为极具吸引力的生物标志物候选物。然而,一种快速灵活的基于外泌体的诊断方法,用于区分不同生物流体中不同癌症类型的人类癌症,尚未确定。在这里,我们描述了一种基于机器学习的新型计算方法,使用一组与外泌体相关的蛋白质来区分癌症。利用来自人类细胞系、组织、血浆、血清和尿液样本的外泌体蛋白质数据集,这些样本来自多种癌症,我们确定网格蛋白重链(CLTC)、埃兹蛋白(EZR)、踝蛋白-1(TLN1)、腺苷酸环化酶相关蛋白1(CAP1)和肌动蛋白结合蛋白(MSN)作为外泌体高度丰富的通用生物标志物,并定义了三组泛癌外泌体蛋白质,它们可以区分癌症外泌体与其他外泌体,并有助于使用随机森林模型对癌症亚型进行分类。所有使用来自血浆、血清或尿液来源的外泌体蛋白质的模型,其曲线下面积(AUROC)得分均高于0.91,并且与支持向量机、K近邻分类器和高斯朴素贝叶斯相比,表现出卓越的性能。本研究提供了一种与癌症外泌体相关的可靠蛋白质生物标志物特征,具有可扩展的机器学习能力,用于敏感和特异的非侵入性癌症诊断方法。

相似文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验