Brigante Giulia, Lazzaretti Clara, Paradiso Elia, Nuzzo Federico, Sitti Martina, Tüttelmann Frank, Moretti Gabriele, Silvestri Roberto, Gemignani Federica, Försti Asta, Hemminki Kari, Elisei Rossella, Romei Cristina, Zizzi Eric Adriano, Deriu Marco Agostino, Simoni Manuela, Landi Stefano, Casarini Livio
Unit of Endocrinology, Department of Biomedical, Metabolic and Neural Sciences, University of Modena and Reggio Emilia, Modena, Italy.
Unit of Endocrinology, Department of Medical Specialties, Azienda Ospedaliero-Universitaria, Modena, Italy.
Eur Thyroid J. 2022 Sep 12;11(5). doi: 10.1530/ETJ-22-0058. Print 2022 Oct 1.
To identify a peculiar genetic combination predisposing to differentiated thyroid carcinoma (DTC), we selected a set of single nucleotide polymorphisms (SNPs) associated with DTC risk, considering polygenic risk score (PRS), Bayesian statistics and a machine learning (ML) classifier to describe cases and controls in three different datasets. Dataset 1 (649 DTC, 431 controls) has been previously genotyped in a genome-wide association study (GWAS) on Italian DTC. Dataset 2 (234 DTC, 101 controls) and dataset 3 (404 DTC, 392 controls) were genotyped. Associations of 171 SNPs reported to predispose to DTC in candidate studies were extracted from the GWAS of dataset 1, followed by replication of SNPs associated with DTC risk (P < 0.05) in dataset 2. The reliability of the identified SNPs was confirmed by PRS and Bayesian statistics after merging the three datasets. SNPs were used to describe the case/control state of individuals by ML classifier. Starting from 171 SNPs associated with DTC, 15 were positive in both datasets 1 and 2. Using these markers, PRS revealed that individuals in the fifth quintile had a seven-fold increased risk of DTC than those in the first. Bayesian inference confirmed that the selected 15 SNPs differentiate cases from controls. Results were corroborated by ML, finding a maximum AUC of about 0.7. A restricted selection of only 15 DTC-associated SNPs is able to describe the inner genetic structure of Italian individuals, and ML allows a fair prediction of case or control status based solely on the individual genetic background.
为了识别易患分化型甲状腺癌(DTC)的特殊基因组合,我们选择了一组与DTC风险相关的单核苷酸多态性(SNP),并考虑多基因风险评分(PRS)、贝叶斯统计和机器学习(ML)分类器来描述三个不同数据集中的病例和对照。数据集1(649例DTC,431例对照)先前已在一项关于意大利DTC的全基因组关联研究(GWAS)中进行了基因分型。对数据集2(234例DTC,101例对照)和数据集3(404例DTC,392例对照)进行了基因分型。从数据集1的GWAS中提取了候选研究中报告的171个易患DTC的SNP的关联,随后在数据集2中对与DTC风险相关的SNP(P<0.05)进行重复验证。合并三个数据集后,通过PRS和贝叶斯统计确认了所识别SNP的可靠性。通过ML分类器使用SNP来描述个体的病例/对照状态。从与DTC相关的171个SNP开始,有15个在数据集1和数据集中均为阳性。使用这些标记,PRS显示第五分位数的个体患DTC的风险比第一分位数的个体高7倍。贝叶斯推断证实,所选的15个SNP能够区分病例和对照。ML也证实了这一结果,发现最大AUC约为0.7。仅选择15个与DTC相关的SNP就能描述意大利个体的内部遗传结构,并且ML仅基于个体遗传背景就能对病例或对照状态进行合理预测。