Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark; Sino-Danish Center for Education and Research, Aarhus University, Aarhus, Denmark.
Center for Health Data Science, University of Copenhagen, Copenhagen, Denmark.
Gastroenterology. 2023 Jul;165(1):121-132.e5. doi: 10.1053/j.gastro.2023.03.208. Epub 2023 Mar 24.
BACKGROUND & AIMS: Colonic adenomatous polyps, or adenomas, are frequent precancerous lesions and the origin of most cases of colorectal adenocarcinoma. However, we know from epidemiologic studies that although most colorectal cancers (CRCs) originate from adenomas, only a small fraction of adenomas (3%-5%) ever progress to cancer. At present, there are no molecular markers to guide follow-up surveillance programs.
We profiled, by mass spectrometry-based proteomics combined with machine learning analysis, a selected cohort of formalin-fixed, paraffin-embedded high-grade (HG) adenomas with long clinical follow-up, collected as part of the Danish national screening program. We grouped subjects in the cohort according to their subsequent history of findings: a nonmetachronous advanced neoplasia group (G0), with no new HG adenomas or CRCs up to 10 years after polypectomy, and a metachronous advanced neoplasia group (G1) where individuals developed a new HG adenoma or CRC within 5 years of diagnosis.
We generated a proteome dataset from 98 selected HG adenoma samples, including 20 technical replicates, of which 45 samples belonged to the nonmetachronous advanced neoplasia group and 53 to the metachronous advanced neoplasia group. The clear distinction of these 2 groups seen in a uniform manifold approximation and projection plot indicated that the information contained within the abundance of the ∼5000 proteins was sufficient to predict the future occurrence of HG adenomas or development of CRC.
We performed an in-depth analysis of quantitative proteomic data from 98 resected adenoma samples using various novel algorithms and statistical packages and found that their proteome can predict development of metachronous advanced lesions and progression several years in advance.
结肠腺瘤性息肉,又称腺瘤,是常见的癌前病变,也是大多数结直肠腺癌的起源。然而,我们从流行病学研究中得知,尽管大多数结直肠癌(CRC)起源于腺瘤,但只有一小部分腺瘤(3%-5%)会进展为癌症。目前,尚无分子标志物来指导随访监测计划。
我们通过基于质谱的蛋白质组学结合机器学习分析,对丹麦国家筛查计划中收集的一组经过福尔马林固定、石蜡包埋的高级(HG)腺瘤进行了分析,这些腺瘤具有长期的临床随访。我们根据受试者的后续发现将队列分组:非同时性高级别肿瘤组(G0),在息肉切除后 10 年内无新的 HG 腺瘤或 CRC;同时性高级别肿瘤组(G1),在诊断后 5 年内有新的 HG 腺瘤或 CRC。
我们从 98 个选定的 HG 腺瘤样本中生成了一个蛋白质组数据集,其中包括 20 个技术重复样本,其中 45 个样本属于非同时性高级别肿瘤组,53 个样本属于同时性高级别肿瘤组。在统一流形逼近和投影图中,这两组之间的明显区别表明,在大约 5000 种蛋白质的丰度中包含的信息足以预测 HG 腺瘤的发生或 CRC 的发展。
我们使用各种新算法和统计软件包对 98 个切除的腺瘤样本的定量蛋白质组学数据进行了深入分析,发现它们的蛋白质组可以预测同时性高级病变和几年后进展的发生。