Sha Jiahang, Bao Jingxuan, Liu Kefei, Yang Shu, Wen Zixuan, Cui Yuhan, Wen Junhao, Davatzikos Christos, Moore Jason H, Saykin Andrew J, Long Qi, Shen Li
Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, USA.
Suzhou Institute for Advanced Research, University of Science and Technology of China, Suzhou, China.
Proceedings (IEEE Int Conf Bioinformatics Biomed). 2022 Dec;2022:541-548. doi: 10.1109/bibm55620.2022.9995342.
Investigating the relationship between genetic variation and phenotypic traits is a key issue in quantitative genetics. Specifically for Alzheimer's disease, the association between genetic markers and quantitative traits remains vague while, once identified, will provide valuable guidance for the study and development of genetic-based treatment approaches. Currently, to analyze the association of two modalities, sparse canonical correlation analysis (SCCA) is commonly used to compute one sparse linear combination of the variable features for each modality, giving a pair of linear combination vectors in total that maximizes the cross-correlation between the analyzed modalities. One drawback of the plain SCCA model is that the existing findings and knowledge cannot be integrated into the model as priors to help extract interesting correlation as well as identify biologically meaningful genetic and phenotypic markers. To bridge this gap, we introduce preference matrix guided SCCA (PM-SCCA) that not only takes priors encoded as a preference matrix but also maintains computational simplicity. A simulation study and a real-data experiment are conducted to investigate the effectiveness of the model. Both experiments demonstrate that the proposed PM-SCCA model can capture not only genotype-phenotype correlation but also relevant features effectively.
研究遗传变异与表型性状之间的关系是数量遗传学中的一个关键问题。特别是对于阿尔茨海默病,遗传标记与数量性状之间的关联仍然不明确,而一旦确定,将为基于基因的治疗方法的研究和开发提供有价值的指导。目前,为了分析两种模态之间的关联,稀疏典型相关分析(SCCA)通常用于为每种模态计算变量特征的一个稀疏线性组合,总共给出一对线性组合向量,以最大化所分析模态之间的互相关性。普通SCCA模型的一个缺点是,现有的发现和知识不能作为先验信息整合到模型中,以帮助提取有趣的相关性并识别具有生物学意义的遗传和表型标记。为了弥补这一差距,我们引入了偏好矩阵引导的SCCA(PM-SCCA),它不仅采用编码为先验偏好矩阵的信息,而且保持了计算的简便性。我们进行了一项模拟研究和一项真实数据实验来研究该模型的有效性。两个实验都表明,所提出的PM-SCCA模型不仅可以有效地捕捉基因型-表型相关性,还可以捕捉相关特征。