Biomedical Informatics Training Program, Stanford University, Stanford, California.
Department of Dermatology, Stanford School of Medicine, Stanford, California.
Hum Mutat. 2019 Sep;40(9):1314-1320. doi: 10.1002/humu.23825. Epub 2019 Jun 24.
Genetics play a key role in venous thromboembolism (VTE) risk, however established risk factors in European populations do not translate to individuals of African descent because of the differences in allele frequencies between populations. As part of the fifth iteration of the Critical Assessment of Genome Interpretation, participants were asked to predict VTE status in exome data from African American subjects. Participants were provided with 103 unlabeled exomes from patients treated with warfarin for non-VTE causes or VTE and asked to predict which disease each subject had been treated for. Given the lack of training data, many participants opted to use unsupervised machine learning methods, clustering the exomes by variation in genes known to be associated with VTE. The best performing method using only VTE related genes achieved an area under the ROC curve of 0.65. Here, we discuss the range of methods used in the prediction of VTE from sequence data and explore some of the difficulties of conducting a challenge with known confounders. In addition, we show that an existing genetic risk score for VTE that was developed in European subjects works well in African Americans.
遗传学在静脉血栓栓塞症(VTE)风险中起着关键作用,然而,由于人群之间等位基因频率的差异,欧洲人群中已确定的风险因素并不能转化为非洲裔个体。作为基因组解读关键评估第五轮的一部分,参与者被要求根据非 VTE 病因或 VTE 接受华法林治疗的非裔美国人受试者的外显子数据预测 VTE 状态。参与者提供了 103 个未经标记的外显子,这些外显子来自因非 VTE 病因或 VTE 而接受华法林治疗的患者,并被要求预测每位受试者接受治疗的疾病。由于缺乏训练数据,许多参与者选择使用无监督机器学习方法,根据与 VTE 相关的基因的变异对外显子进行聚类。仅使用与 VTE 相关的基因的表现最佳的方法,ROC 曲线下面积达到 0.65。在这里,我们讨论了从序列数据预测 VTE 时使用的各种方法,并探讨了在存在已知混杂因素的情况下进行挑战的一些困难。此外,我们还表明,在欧洲人群中开发的用于 VTE 的现有遗传风险评分在非洲裔美国人中效果良好。