Yuan Meng, Goovaerts Seppe, Hoskens Hanne, Richmond Stephen, Walsh Susan, Shriver Mark D, Shaffer John R, Marazita Mary L, Weinberg Seth M, Peeters Hilde, Claes Peter
bioRxiv. 2023 Aug 14:2023.08.13.553129. doi: 10.1101/2023.08.13.553129.
A genome-wide association study (GWAS) of a complex, multi-dimensional morphological trait, such as the human face, typically relies on predefined and simplified phenotypic measurements, such as inter-landmark distances and angles. These measures are predominantly designed by human experts based on perceived biological or clinical knowledge. To avoid use handcrafted phenotypes (i.e., a priori expert-identified phenotypes), alternative automatically extracted phenotypic descriptors, such as features derived from dimension reduction techniques (e.g., principal component analysis), are employed. While the features generated by such computational algorithms capture the geometric variations of the biological shape, they are not necessarily genetically relevant. Therefore, genetically informed data-driven phenotyping is desirable. Here, we propose an approach where phenotyping is done through a data-driven optimization of trait heritability, defined as the degree of variation in a phenotypic trait in a population that is due to genetic variation. The resulting phenotyping process consists of two steps: 1) constructing a feature space that models shape variations using dimension reduction techniques, and 2) searching for directions in the feature space exhibiting high trait heritability using a genetic search algorithm (i.e., heuristic inspired by natural selection). We show that the phenotypes resulting from the proposed trait heritability-optimized training differ from those of principal components in the following aspects: 1) higher trait heritability, 2) higher SNP heritability, and 3) identification of the same number of independent genetic loci with a smaller number of effective traits. Our results demonstrate that data-driven trait heritability-based optimization enables the automatic extraction of genetically relevant phenotypes, as shown by their increased power in genome-wide association scans.
对复杂的多维形态特征(如人脸)进行全基因组关联研究(GWAS)通常依赖于预先定义和简化的表型测量,如界标间距离和角度。这些测量主要由人类专家根据感知到的生物学或临床知识设计。为避免使用手工制作的表型(即先验专家识别的表型),可采用自动提取的替代表型描述符,如从降维技术(如主成分分析)衍生的特征。虽然此类计算算法生成的特征捕获了生物形状的几何变化,但它们不一定与基因相关。因此,基于基因信息的数据驱动表型分析是可取的。在这里,我们提出一种方法,通过对性状遗传力进行数据驱动的优化来进行表型分析,性状遗传力定义为群体中表型性状因基因变异而产生的变异程度。由此产生的表型分析过程包括两个步骤:1)使用降维技术构建一个模拟形状变化的特征空间,2)使用遗传搜索算法(即受自然选择启发的启发式算法)在特征空间中搜索表现出高性状遗传力的方向。我们表明,所提出的性状遗传力优化训练产生的表型在以下方面与主成分的表型不同:1)更高的性状遗传力,2)更高的SNP遗传力,以及3)用较少数量的有效性状识别相同数量的独立基因座。我们的结果表明,基于数据驱动的性状遗传力优化能够自动提取与基因相关的表型,如它们在全基因组关联扫描中的效力增加所示。