López de Maturana Evangelina, Pineda Sílvia, Brand Angela, Van Steen Kristel, Malats Núria
Genetic and Molecular Epidemiology Group, Spanish National Cancer Research Centre (CNIO), Madrid, Spain.
Institute for Public Health Genomics, Maastricht University, Maastricht, Netherlands.
Genet Epidemiol. 2016 Nov;40(7):558-569. doi: 10.1002/gepi.21992. Epub 2016 Jul 18.
Primary and secondary prevention can highly benefit a personalized medicine approach through the accurate discrimination of individuals at high risk of developing a specific disease from those at moderate and low risk. To this end precise risk prediction models need to be built. This endeavor requires a precise characterization of the individual exposome, genome, and phenome. Massive molecular omics data representing the different layers of the biological processes of the host and the nonhost will enable to build more accurate risk prediction models. Epidemiologists aim to integrate omics data along with important information coming from other sources (questionnaires, candidate markers) that has been proved to be relevant in the discrimination risk assessment of complex diseases. However, the integrative models in large-scale epidemiologic research are still in their infancy and they face numerous challenges, some of them at the analytical stage. So far, there are a small number of studies that have integrated more than two omics data sets, and the inclusion of non-omics data in the same models is still missing in most of studies. In this contribution, we aim at approaching the omics and non-omics data integration from the epidemiology scope by considering the "massive" inclusion of variables in the risk assessment and predictive models. We also provide already available examples of integrative contributions in the field, propose analytical strategies that allow considering both omics and non-omics data in the models, and finally review the challenges imbedding this type of research.
一级预防和二级预防可以通过准确区分高风险、中度风险和低风险的个体,从个性化医疗方法中获得极大益处。为此,需要构建精确的风险预测模型。这一努力需要对个体暴露组、基因组和表型组进行精确表征。代表宿主和非宿主生物过程不同层面的大量分子组学数据将有助于构建更准确的风险预测模型。流行病学家旨在将组学数据与来自其他来源(问卷、候选标志物)的重要信息整合起来,这些信息已被证明与复杂疾病的鉴别风险评估相关。然而,大规模流行病学研究中的整合模型仍处于起步阶段,面临着众多挑战,其中一些挑战出现在分析阶段。到目前为止,仅有少数研究整合了两个以上的组学数据集,而且大多数研究仍未在同一模型中纳入非组学数据。在本论文中,我们旨在从流行病学角度探讨组学和非组学数据的整合,即在风险评估和预测模型中“大量”纳入变量。我们还提供了该领域已有的整合研究实例,提出了在模型中同时考虑组学和非组学数据的分析策略,最后回顾了这类研究中存在的挑战。