Seffens William, Evans Chad, Taylor Herman
Physiology Department, Morehouse School of Medicine, Atlanta, GA, USA.
Director of Cardiovascular Research Institute (CVRI), Morehouse School of Medicine, Atlanta, GA, USA.
Bioinform Biol Insights. 2016 May 9;9(Suppl 3):43-54. doi: 10.4137/BBI.S29473. eCollection 2015.
Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.
医疗保健倡议正在推动临床数据的开发和利用,以进行医学发现和转化研究。为大数据实施的机器学习工具已被应用于检测复杂疾病中的模式。本研究聚焦于高血压,并在一项名为“少数族裔健康基因组学与转化研究存储库数据库”的主要临床研究中检查表型数据,该数据库由自我报告的非裔美国人(AA)参与者以及相关队列组成。先前针对非裔美国人高血压的全基因组关联研究假定,易感人群中疾病负担的增加是由于罕见变异。但是,即使是那些旨在关注罕见变异的高血压基因组分析,在许多研究中也只产生了有限的全基因组结果。最近的研究表明,机器学习和其他非参数统计方法能够揭示复杂表型、基因型和临床数据之间的关系。我们使用表型数据训练神经网络进行缺失数据插补,以增加临床数据集的可用规模。通过使用扩展数据集展示表型变量与患者病例/对照状态之间关联的性能效应来确定有效性。使用数据挖掘分类工具生成关联规则。