Department of Medicine, Institute of Genomics and Systems Biology, University of Chicago, Chicago, IL, 60637, USA.
Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
Nat Commun. 2019 Dec 3;10(1):5508. doi: 10.1038/s41467-019-13455-0.
Typically, estimating genetic parameters, such as disease heritability and between-disease genetic correlations, demands large datasets containing all relevant phenotypic measures and detailed knowledge of family relationships or, alternatively, genotypic and phenotypic data for numerous unrelated individuals. Here, we suggest an alternative, efficient estimation approach through the construction of two disease metrics from large health datasets: temporal disease prevalence curves and low-dimensional disease embeddings. We present eleven thousand heritability estimates corresponding to five study types: twins, traditional family studies, health records-based family studies, single nucleotide polymorphisms, and polygenic risk scores. We also compute over six hundred thousand estimates of genetic, environmental and phenotypic correlations. Furthermore, we find that: (1) disease curve shapes cluster into five general patterns; (2) early-onset diseases tend to have lower prevalence than late-onset diseases (Spearman's ρ = 0.32, p < 10); and (3) the disease onset age and heritability are negatively correlated (ρ = -0.46, p < 10).
通常,估计遗传参数,如疾病遗传力和疾病间遗传相关性,需要包含所有相关表型测量值的大型数据集和对家庭关系的详细了解,或者需要大量无关个体的基因型和表型数据。在这里,我们通过从大型健康数据集中构建两个疾病指标提出了一种替代的、有效的估计方法:时间疾病流行曲线和低维疾病嵌入。我们提出了一万一千个遗传力估计值,对应五种研究类型:双胞胎、传统家庭研究、基于健康记录的家庭研究、单核苷酸多态性和多基因风险评分。我们还计算了超过六百万个遗传、环境和表型相关性的估计值。此外,我们发现:(1)疾病曲线形状聚类为五种一般模式;(2)早发性疾病的流行率往往低于晚发性疾病(Spearman's ρ=0.32,p<10);(3)疾病发病年龄和遗传力呈负相关(ρ=-0.46,p<10)。