Xiong Yichun, Li Jiaqi, Jin Wang, Sheng Xiaoran, Peng Hui, Wang Zhiyi, Jia Caifeng, Zhuo Lili, Zhang Yibo, Huang Jingzhe, Zhai Modi, Lyu Beibei, Sun Jie, Zhou Meng
School of Biomedical Engineering, Eye Hospital, Wenzhou Medical University, Wenzhou, 325027, P. R. China.
Sci Data. 2025 Apr 1;12(1):551. doi: 10.1038/s41597-025-04899-9.
Early detection and intervention of precancerous lesions are crucial in reducing cancer morbidity and mortality. Comprehensive analysis of genomic, transcriptomic, proteomic and epigenomic alterations can provide insights into the early stages of carcinogenesis. However, the lacke of an integrated, well-curated data resource of molecular signatures limits our understanding of precancerous processes. Here, we introduce a comprehensive PreCancerous Molecular Resource (PCMR), which compiles 25,828 molecular profiles of precancerous samples paired with normal or malignant counterparts. These profiles cover precancerous lesions of 35 cancer types across 20 organs and tissues, derived from tissue samples, liquid biopsies, cell lines and organoids, with data from transcriptomics, proteomics and epigenomics. PCMR includes 62,566 precancer-gene associations derived from differential analysis and text-mining using the ChatGPT large language model. We examined PCMR dataset reliability and significance by the authoritative precancerous molecular signature, along with its biological and clinical relevance. Overall, PCMR will serve as a valuable resource for advancing precancer research and ultimately improving patient outcomes.
癌前病变的早期检测和干预对于降低癌症发病率和死亡率至关重要。对基因组、转录组、蛋白质组和表观基因组改变的综合分析能够为癌变的早期阶段提供见解。然而,缺乏一个整合的、精心策划的分子特征数据资源限制了我们对癌前过程的理解。在此,我们介绍了一个全面的癌前分子资源库(PCMR),它汇编了25828个癌前样本与正常或恶性对应样本配对的分子谱。这些谱涵盖了来自20个器官和组织的35种癌症类型的癌前病变,来源于组织样本、液体活检、细胞系和类器官,包含转录组学、蛋白质组学和表观基因组学的数据。PCMR包含通过使用ChatGPT大语言模型进行差异分析和文本挖掘得出的62566个癌前基因关联。我们通过权威的癌前分子特征及其生物学和临床相关性来检验PCMR数据集的可靠性和重要性。总体而言,PCMR将成为推进癌前研究并最终改善患者预后的宝贵资源。