Department of Biosystems, Livestock Genetics, KU Leuven, Kasteelpark Arenberg 30 - Box 2472, 3001, Leuven, Belgium.
BMC Genomics. 2020 Jan 29;21(1):94. doi: 10.1186/s12864-020-6463-x.
PLINK is probably the most used program for analyzing SNP genotypes and runs of homozygosity (ROH), both in human and in animal populations. The last decade, ROH analyses have become the state-of-the-art method for inbreeding assessment. In PLINK, the --homozyg function is used to perform ROH analyses and relies on several input settings. These settings can have a large impact on the outcome and default values are not always appropriate for medium density SNP array data. Guidelines for a robust and uniform ROH analysis in PLINK using medium density data are lacking, albeit these guidelines are vital for comparing different ROH studies. In this study, 8 populations of different livestock and pet species are used to demonstrate the importance of PLINK input settings. Moreover, the effects of pruning SNPs for low minor allele frequencies and linkage disequilibrium on ROH detection are shown.
We introduce the genome coverage parameter to appropriately estimate F and to check the validity of ROH analyses. The effect of pruning for linkage disequilibrium and low minor allele frequencies on ROH analyses is highly population dependent and such pruning may result in missed ROH. PLINK's minimal density requirement is crucial for medium density genotypes and if set too low, genome coverage of the ROH analysis is limited. Finally, we provide recommendations for the maximal gap, scanning window length and threshold settings.
In this study, we present guidelines for an adequate and robust ROH analysis in PLINK on medium density SNP data. Furthermore, we advise to report parameter settings in publications, and to validate them prior to analysis. Moreover, we encourage authors to report genome coverage to reflect the ROH analysis' validity. Implementing these guidelines will substantially improve the overall quality and uniformity of ROH analyses.
PLINK 可能是分析 SNP 基因型和纯合子区域(ROH)最常用的程序,无论是在人类还是动物群体中。在过去的十年中,ROH 分析已成为评估近交的最新方法。在 PLINK 中,--homozyg 功能用于执行 ROH 分析,并依赖于几个输入设置。这些设置对结果有很大的影响,默认值并不总是适用于中等密度 SNP 数组数据。尽管这些指南对于比较不同的 ROH 研究至关重要,但在 PLINK 中使用中等密度数据进行稳健和统一的 ROH 分析的指南仍然缺乏。在这项研究中,使用 8 个不同的家畜和宠物物种的种群来演示 PLINK 输入设置的重要性。此外,还展示了为检测低次要等位基因频率和连锁不平衡而修剪 SNP 对 ROH 检测的影响。
我们引入了基因组覆盖参数来适当估计 F 值并检查 ROH 分析的有效性。对连锁不平衡和低次要等位基因频率进行修剪对 ROH 分析的影响高度依赖于种群,这种修剪可能导致 ROH 漏检。PLINK 的最小密度要求对于中等密度基因型至关重要,如果设置得太低,则 ROH 分析的基因组覆盖范围将受到限制。最后,我们提供了最大间隙、扫描窗口长度和阈值设置的建议。
在这项研究中,我们提出了在 PLINK 中对中等密度 SNP 数据进行充分和稳健的 ROH 分析的指南。此外,我们建议在出版物中报告参数设置,并在分析之前进行验证。此外,我们鼓励作者报告基因组覆盖范围,以反映 ROH 分析的有效性。实施这些指南将大大提高 ROH 分析的整体质量和一致性。