Fan Jun, Wu Yirong, Yuan Ming, Page David, Liu Jie, Ong Irene M, Peissig Peggy, Burnside Elizabeth
Department of Statistics, University of Wisconsin-Madison, 1300 University Avenue, Madison, WI 53706, United States,
Department of Radiology, University of Wisconsin-Madison, 600 Highland Avenue, Madison, WI 53792, United States,
J Mach Learn Res. 2016 Dec;17.
Predicting breast cancer risk has long been a goal of medical research in the pursuit of precision medicine. The goal of this study is to develop novel penalized methods to improve breast cancer risk prediction by leveraging structure information in electronic health records. We conducted a retrospective case-control study, garnering 49 mammography descriptors and 77 high-frequency/low-penetrance single-nucleotide polymorphisms (SNPs) from an existing personalized medicine data repository. Structured mammography reports and breast imaging features have long been part of a standard electronic health record (EHR), and genetic markers likely will be in the near future. Lasso and its variants are widely used approaches to integrated learning and feature selection, and our methodological contribution is to incorporate the dependence structure among the features into these approaches. More specifically, we propose a new methodology by combining group penalty and [Formula: see text] (1 ≤ ≤ 2) fusion penalty to improve breast cancer risk prediction, taking into account structure information in mammography descriptors and SNPs. We demonstrate that our method provides benefits that are both statistically significant and potentially significant to people's lives.
长期以来,预测乳腺癌风险一直是精准医学领域医学研究的目标。本研究的目的是开发新的惩罚方法,通过利用电子健康记录中的结构信息来改进乳腺癌风险预测。我们进行了一项回顾性病例对照研究,从现有的个性化医疗数据存储库中收集了49个乳房X线摄影描述符和77个高频/低穿透性单核苷酸多态性(SNP)。结构化的乳房X线摄影报告和乳房成像特征长期以来一直是标准电子健康记录(EHR)的一部分,而遗传标记在不久的将来可能也会如此。套索回归及其变体是广泛用于集成学习和特征选择的方法,我们的方法贡献在于将特征之间的依赖结构纳入这些方法。更具体地说,我们提出了一种新方法,通过结合组惩罚和 [公式:见原文] (1 ≤ ≤ 2)融合惩罚来改进乳腺癌风险预测,同时考虑乳房X线摄影描述符和SNP中的结构信息。我们证明,我们的方法带来的益处既具有统计学意义,又对人们的生活具有潜在的重要意义。