Institute of Clinical Medicine, Pathology and Forensic Medicine, and Translational Cancer Research Area, University of Eastern Finland, P.O. Box 1627, FI-70211, Kuopio, Finland.
Institute of Clinical Medicine, Oncology, University of Eastern Finland, P.O. Box 1627, FI-70211, Kuopio, Finland.
Sci Rep. 2020 Jul 6;10(1):11044. doi: 10.1038/s41598-020-66907-9.
Breast cancer (BC) is a multifactorial disease and the most common cancer in women worldwide. We describe a machine learning approach to identify a combination of interacting genetic variants (SNPs) and demographic risk factors for BC, especially factors related to both familial history (Group 1) and oestrogen metabolism (Group 2), for predicting BC risk. This approach identifies the best combinations of interacting genetic and demographic risk factors that yield the highest BC risk prediction accuracy. In tests on the Kuopio Breast Cancer Project (KBCP) dataset, our approach achieves a mean average precision (mAP) of 77.78 in predicting BC risk by using interacting genetic and Group 1 features, which is better than the mAPs of 74.19 and 73.65 achieved using only Group 1 features and interacting SNPs, respectively. Similarly, using interacting genetic and Group 2 features yields a mAP of 78.00, which outperforms the system based on only Group 2 features, which has a mAP of 72.57. Furthermore, the gene interaction maps built from genes associated with SNPs that interact with demographic risk factors indicate important BC-related biological entities, such as angiogenesis, apoptosis and oestrogen-related networks. The results also show that demographic risk factors are individually more important than genetic variants in predicting BC risk.
乳腺癌(BC)是一种多因素疾病,也是全球女性中最常见的癌症。我们描述了一种机器学习方法,用于识别与乳腺癌相关的遗传变异(SNP)和人口统计学风险因素的组合,特别是与家族史(第 1 组)和雌激素代谢(第 2 组)相关的因素,以预测乳腺癌风险。这种方法可以确定最佳的遗传和人口统计学风险因素组合,从而获得最高的乳腺癌风险预测准确性。在对库奥皮奥乳腺癌项目(KBCP)数据集的测试中,我们的方法通过使用相互作用的遗传和第 1 组特征来预测乳腺癌风险,平均精度(mAP)达到 77.78,优于仅使用第 1 组特征和相互作用 SNP 的 mAPs 74.19 和 73.65。同样,使用相互作用的遗传和第 2 组特征可以产生 78.00 的 mAP,优于仅基于第 2 组特征的系统,后者的 mAP 为 72.57。此外,从与人口统计学风险因素相互作用的 SNP 相关的基因构建的基因相互作用图表明了与乳腺癌相关的重要生物学实体,如血管生成、细胞凋亡和雌激素相关网络。结果还表明,在预测乳腺癌风险方面,人口统计学风险因素比遗传变异个体更重要。