Research Center for Advanced Analysis, National Agriculture and Food Research Organization, Tsukuba, Ibaraki, Japan.
BMC Bioinformatics. 2023 Apr 5;24(1):135. doi: 10.1186/s12859-023-05169-4.
Population structure and cryptic relatedness between individuals (samples) are two major factors affecting false positives in genome-wide association studies (GWAS). In addition, population stratification and genetic relatedness in genomic selection in animal and plant breeding can affect prediction accuracy. The methods commonly used for solving these problems are principal component analysis (to adjust for population stratification) and marker-based kinship estimates (to correct for the confounding effects of genetic relatedness). Currently, many tools and software are available that analyze genetic variation among individuals to determine population structure and genetic relationships. However, none of these tools or pipelines perform such analyses in a single workflow and visualize all the various results in a single interactive web application.
We developed PSReliP, a standalone, freely available pipeline for the analysis and visualization of population structure and relatedness between individuals in a user-specified genetic variant dataset. The analysis stage of PSReliP is responsible for executing all steps of data filtering and analysis and contains an ordered sequence of commands from PLINK, a whole-genome association analysis toolset, along with in-house shell scripts and Perl programs that support data pipelining. The visualization stage is provided by Shiny apps, an R-based interactive web application. In this study, we describe the characteristics and features of PSReliP and demonstrate how it can be applied to real genome-wide genetic variant data.
The PSReliP pipeline allows users to quickly analyze genetic variants such as single nucleotide polymorphisms and small insertions or deletions at the genome level to estimate population structure and cryptic relatedness using PLINK software and to visualize the analysis results in interactive tables, plots, and charts using Shiny technology. The analysis and assessment of population stratification and genetic relatedness can aid in choosing an appropriate approach for the statistical analysis of GWAS data and predictions in genomic selection. The various outputs from PLINK can be used for further downstream analysis. The code and manual for PSReliP are available at https://github.com/solelena/PSReliP .
人群结构和个体间的隐性亲缘关系(样本)是影响全基因组关联研究(GWAS)中假阳性的两个主要因素。此外,动物和植物育种中的基因组选择中的群体分层和遗传相关性会影响预测准确性。解决这些问题的常用方法是主成分分析(用于调整群体分层)和基于标记的亲缘关系估计(用于纠正遗传相关性的混杂影响)。目前,有许多工具和软件可用于分析个体之间的遗传变异,以确定群体结构和遗传关系。但是,这些工具或管道都没有在单个工作流程中执行此类分析,并在单个交互式网络应用程序中可视化所有各种结果。
我们开发了 PSReliP,这是一个独立的、免费提供的管道,用于分析和可视化用户指定的遗传变异数据集个体之间的群体结构和亲缘关系。PSReliP 的分析阶段负责执行所有数据过滤和分析步骤,并包含来自全基因组关联分析工具集 PLINK 的有序命令序列,以及支持数据管道的内部 shell 脚本和 Perl 程序。可视化阶段由 Shiny 应用程序提供,这是一个基于 R 的交互式网络应用程序。在本研究中,我们描述了 PSReliP 的特点和功能,并展示了如何将其应用于真实的全基因组遗传变异数据。
PSReliP 管道允许用户快速分析遗传变异,例如单核苷酸多态性和小的插入或缺失,以使用 PLINK 软件在基因组水平上估计群体结构和隐性亲缘关系,并使用 Shiny 技术在交互式表格、图和图表中可视化分析结果。对群体分层和遗传相关性的分析和评估有助于选择适当的方法来进行 GWAS 数据的统计分析和基因组选择中的预测。PLINK 的各种输出可用于进一步的下游分析。PSReliP 的代码和手册可在 https://github.com/solelena/PSReliP 获得。