Department of Cancer Biology, University of Texas MD Anderson Cancer Center, Houston, United States.
Department of Bioengineering, Rice University, Houston, United States.
Elife. 2024 Mar 26;12:RP90390. doi: 10.7554/eLife.90390.
Non-invasive early cancer diagnosis remains challenging due to the low sensitivity and specificity of current diagnostic approaches. Exosomes are membrane-bound nanovesicles secreted by all cells that contain DNA, RNA, and proteins that are representative of the parent cells. This property, along with the abundance of exosomes in biological fluids makes them compelling candidates as biomarkers. However, a rapid and flexible exosome-based diagnostic method to distinguish human cancers across cancer types in diverse biological fluids is yet to be defined. Here, we describe a novel machine learning-based computational method to distinguish cancers using a panel of proteins associated with exosomes. Employing datasets of exosome proteins from human cell lines, tissue, plasma, serum, and urine samples from a variety of cancers, we identify Clathrin Heavy Chain (CLTC), Ezrin, (EZR), Talin-1 (TLN1), Adenylyl cyclase-associated protein 1 (CAP1), and Moesin (MSN) as highly abundant universal biomarkers for exosomes and define three panels of pan-cancer exosome proteins that distinguish cancer exosomes from other exosomes and aid in classifying cancer subtypes employing random forest models. All the models using proteins from plasma, serum, or urine-derived exosomes yield AUROC scores higher than 0.91 and demonstrate superior performance compared to Support Vector Machine, K Nearest Neighbor Classifier and Gaussian Naive Bayes. This study provides a reliable protein biomarker signature associated with cancer exosomes with scalable machine learning capability for a sensitive and specific non-invasive method of cancer diagnosis.
由于当前诊断方法的灵敏度和特异性较低,非侵入性早期癌症诊断仍然具有挑战性。外泌体是所有细胞分泌的具有膜的纳米囊泡,其中包含代表亲代细胞的 DNA、RNA 和蛋白质。这种特性,加上外泌体在生物体液中的丰富程度,使它们成为有吸引力的生物标志物候选物。然而,一种快速灵活的基于外泌体的诊断方法,以区分不同生物体液中不同类型癌症的人类癌症,尚未得到明确界定。在这里,我们描述了一种使用与外泌体相关的蛋白质组合来区分癌症的新型基于机器学习的计算方法。我们使用来自人类细胞系、组织、血浆、血清和尿液样本的外泌体蛋白质数据集,从各种癌症中鉴定出网格蛋白重链 (CLTC)、埃兹蛋白 (EZR)、桩蛋白-1 (TLN1)、腺苷酸环化酶相关蛋白 1 (CAP1) 和膜突蛋白 (MSN) 作为外泌体的高度丰富的通用生物标志物,并定义了三个泛癌外泌体蛋白质面板,可区分癌症外泌体和其他外泌体,并有助于使用随机森林模型对癌症亚型进行分类。所有使用来自血浆、血清或尿液衍生的外泌体的蛋白质的模型均产生高于 0.91 的 AUROC 评分,并表现出优于支持向量机、K 最近邻分类器和高斯朴素贝叶斯的性能。这项研究提供了与癌症外泌体相关的可靠蛋白质生物标志物特征,并具有可扩展的机器学习能力,可用于敏感和特异性的非侵入性癌症诊断方法。