Suppr超能文献

机器学习鉴定出导致乳腺癌风险的相互作用遗传变异:芬兰病例对照研究。

Machine learning identifies interacting genetic variants contributing to breast cancer risk: A case study in Finnish cases and controls.

机构信息

Institute of Clinical Medicine, Pathology and Forensic Medicine, and Translational Cancer Research Area, University of Eastern Finland, P.O. Box 1627, FI-70211, Kuopio, Finland.

Institute of Clinical Medicine, Oncology, University of Eastern Finland, P.O. Box 1627, FI-70211, Kuopio, Finland.

出版信息

Sci Rep. 2018 Sep 3;8(1):13149. doi: 10.1038/s41598-018-31573-5.

Abstract

We propose an effective machine learning approach to identify group of interacting single nucleotide polymorphisms (SNPs), which contribute most to the breast cancer (BC) risk by assuming dependencies among BCAC iCOGS SNPs. We adopt a gradient tree boosting method followed by an adaptive iterative SNP search to capture complex non-linear SNP-SNP interactions and consequently, obtain group of interacting SNPs with high BC risk-predictive potential. We also propose a support vector machine formed by the identified SNPs to classify BC cases and controls. Our approach achieves mean average precision (mAP) of 72.66, 67.24 and 69.25 in discriminating BC cases and controls in KBCP, OBCS and merged KBCP-OBCS sample sets, respectively. These results are better than the mAP of 70.08, 63.61 and 66.41 obtained by using a polygenic risk score model derived from 51 known BC-associated SNPs, respectively, in KBCP, OBCS and merged KBCP-OBCS sample sets. BC subtype analysis further reveals that the 200 identified KBCP SNPs from the proposed method performs favorably in classifying estrogen receptor positive (ER+) and negative (ER-) BC cases both in KBCP and OBCS data. Further, a biological analysis of the identified SNPs reveals genes related to important BC-related mechanisms, estrogen metabolism and apoptosis.

摘要

我们提出了一种有效的机器学习方法,通过假设 BCAC iCOGS SNPs 之间的相关性,来识别对乳腺癌(BC)风险贡献最大的一组相互作用的单核苷酸多态性(SNP)。我们采用梯度提升树方法,然后进行自适应迭代 SNP 搜索,以捕捉复杂的非线性 SNP-SNP 相互作用,从而获得具有高 BC 风险预测潜力的相互作用 SNP 组。我们还提出了一个由鉴定出的 SNP 组成的支持向量机,用于对 BC 病例和对照进行分类。我们的方法在 KBCP、OBCS 和合并的 KBCP-OBCS 样本集中分别实现了区分 BC 病例和对照的平均精度(mAP)为 72.66、67.24 和 69.25。这些结果优于使用源自 51 个已知与 BC 相关的 SNP 的多基因风险评分模型在 KBCP、OBCS 和合并的 KBCP-OBCS 样本集中分别获得的 70.08、63.61 和 66.41 的 mAP。BC 亚型分析进一步表明,该方法从 KBCP 中鉴定出的 200 个 SNP 在 KBCP 和 OBCS 数据中对分类雌激素受体阳性(ER+)和阴性(ER-)BC 病例都表现良好。此外,对鉴定出的 SNP 的生物学分析揭示了与重要的 BC 相关机制、雌激素代谢和细胞凋亡相关的基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b110/6120908/b5bb3a2b17be/41598_2018_31573_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验