Widayati Tyas Arum, Schneider Jadesada, Panteleeva Kseniia, Chernysheva Elizabeth, Hrbkova Natalie, Beck Stephan, Voloshin Vitaly, Chervova Olga
Medical Genomics Lab, Cancer Institute, University College London, London, United Kingdom.
Department of Genetics, Evolution and Environment, University College London, London, United Kingdom.
Front Genet. 2023 Oct 24;14:1258648. doi: 10.3389/fgene.2023.1258648. eCollection 2023.
Aberrant DNA methylation (DNAm) is known to be associated with the aetiology of cancer, including colorectal cancer (CRC). In the past, the availability of open access data has been the main driver of innovative method development and research training. However, this is increasingly being eroded by the move to controlled access, particularly of medical data, including cancer DNAm data. To rejuvenate this valuable tradition, we leveraged DNAm data from 1,845 samples (535 CRC tumours, 522 normal colon tissues adjacent to tumours, 72 colorectal adenomas, and 716 normal colon tissues from healthy individuals) from 14 open access studies deposited in NCBI GEO and ArrayExpress. We calculated each sample's epigenetic age (EA) using eleven epigenetic clock models and derived the corresponding epigenetic age acceleration (EAA). For EA, we observed that most first- and second-generation epigenetic clocks reflect the chronological age in normal tissues adjacent to tumours and healthy individuals [e.g., Horvath ( = 0.77 and 0.79), Zhang elastic net (EN) ( = 0.70 and 0.73)] unlike the epigenetic mitotic clocks (EpiTOC, HypoClock, MiAge) ( < 0.3). For EAA, we used PhenoAge, Wu, and the above mitotic clocks and found them to have distinct distributions in different tissue types, particularly between normal colon tissues adjacent to tumours and cancerous tumours, as well as between normal colon tissues adjacent to tumours and normal colon tissue from healthy individuals. Finally, we harnessed these associations to develop a classifier using elastic net regression (with lasso and ridge regularisations) that predicts CRC diagnosis based on a patient's sex and EAAs calculated from histologically normal controls (i.e., normal colon tissues adjacent to tumours and normal colon tissue from healthy individuals). The classifier demonstrated good diagnostic potential with ROC-AUC = 0.886, which suggests that an EAA-based classifier trained on relevant data could become a tool to support diagnostic/prognostic decisions in CRC for clinical professionals. Our study also reemphasises the importance of open access clinical data for method development and training of young scientists. Obtaining the required approvals for controlled access data would not have been possible in the timeframe of this study.
已知异常DNA甲基化(DNAm)与包括结直肠癌(CRC)在内的癌症病因相关。过去,开放获取数据的可用性一直是创新方法开发和研究培训的主要驱动力。然而,这种情况正日益受到向受限访问转变的侵蚀,尤其是医学数据,包括癌症DNAm数据。为了复兴这一宝贵传统,我们利用了来自14项开放获取研究、存于NCBI GEO和ArrayExpress中的1845个样本(535个CRC肿瘤、522个肿瘤旁正常结肠组织、72个大肠腺瘤以及716个健康个体的正常结肠组织)的DNAm数据。我们使用11种表观遗传时钟模型计算每个样本的表观遗传年龄(EA),并得出相应的表观遗传年龄加速(EAA)。对于EA,我们观察到,与表观遗传有丝分裂时钟(EpiTOC、HypoClock、MiAge)(<0.3)不同,大多数第一代和第二代表观遗传时钟反映肿瘤旁正常组织和健康个体的实际年龄[例如,Horvath(=0.77和0.79)、Zhang弹性网络(EN)(=0.70和0.73)]。对于EAA,我们使用了PhenoAge、Wu以及上述有丝分裂时钟,发现它们在不同组织类型中具有不同的分布,特别是在肿瘤旁正常结肠组织与癌性肿瘤之间,以及肿瘤旁正常结肠组织与健康个体的正常结肠组织之间。最后,我们利用这些关联,使用弹性网络回归(带有套索和岭正则化)开发了一种分类器,该分类器根据患者的性别以及从组织学正常对照(即肿瘤旁正常结肠组织和健康个体的正常结肠组织)计算出的EAA来预测CRC诊断。该分类器显示出良好的诊断潜力,ROC-AUC = 0.886,这表明基于EAA并在相关数据上训练的分类器可能成为临床专业人员在CRC中支持诊断/预后决策的工具。我们的研究还再次强调了开放获取临床数据对于方法开发和年轻科学家培训的重要性。在本研究的时间范围内,不可能获得受限访问数据所需的批准。