Suppr超能文献

使用平滑样条技术定义基因组分析的窗口边界。

Defining window-boundaries for genomic analyses using smoothing spline techniques.

作者信息

Beissinger Timothy M, Rosa Guilherme J M, Kaeppler Shawn M, Gianola Daniel, de Leon Natalia

机构信息

Department of Plant Sciences, University of California, Davis, 95616, USA.

Department of Animal Sciences, University of Wisconsin, Madison, 53706, USA.

出版信息

Genet Sel Evol. 2015 Apr 17;47(1):30. doi: 10.1186/s12711-015-0105-9.

Abstract

BACKGROUND

High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the data and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome.

RESULTS

Simulations applying this method were performed to identify selection signatures from pooled sequencing FST data, for which allele frequencies were estimated from a pool of individuals. The relative ratio of true to false positives was twice that generated by existing techniques. A comparison of the approach to a previous study that involved pooled sequencing FST data from maize suggested that outlying windows were more clearly separated from their neighbors than when using a standard sliding window approach.

CONCLUSIONS

We have developed a novel technique to identify window boundaries for subsequent analysis protocols. When applied to selection studies based on F ST data, this method provides a high discovery rate and minimizes false positives. The method is implemented in the R package GenWin, which is publicly available from CRAN.

摘要

背景

高密度基因组数据通常通过合并相邻标记窗口上的信息来进行分析。对按窗口分组的数据与单个位置的数据进行解释,可能会提高统计功效、简化计算、减少抽样噪声并减少所执行测试的总数。然而,使用相邻标记信息可能会导致过度平滑或平滑不足、不理想的窗口边界规格或高度相关的测试统计量。我们引入了一种基于数据中的统计引导断点来定义窗口的方法,作为分析多个相邻数据点的基础。该方法首先对数据拟合三次平滑样条,然后识别拟合样条的拐点,这些拐点用作相邻窗口的边界。此技术不需要连锁不平衡的先验知识,因此可应用于从个体或混合测序实验收集的数据。此外,与现有方法不同,无需任意选择窗口大小,因为这些窗口大小是根据经验确定的,并且允许沿基因组变化。

结果

应用此方法进行了模拟,以从混合测序FST数据中识别选择特征,其中等位基因频率是从个体池中估计的。真阳性与假阳性的相对比率是现有技术产生的两倍。将该方法与之前一项涉及玉米混合测序FST数据的研究进行比较,结果表明,与使用标准滑动窗口方法相比,异常窗口与其相邻窗口的分离更为明显。

结论

我们开发了一种新颖的技术来识别窗口边界,以供后续分析方案使用。当应用于基于FST数据的选择研究时,该方法具有较高的发现率并将假阳性降至最低。该方法在R包GenWin中实现,可从CRAN公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63e9/4404117/dd95a271db47/12711_2015_105_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验