使用平滑样条技术定义基因组分析的窗口边界。

Defining window-boundaries for genomic analyses using smoothing spline techniques.

作者信息

Beissinger Timothy M, Rosa Guilherme J M, Kaeppler Shawn M, Gianola Daniel, de Leon Natalia

机构信息

Department of Plant Sciences, University of California, Davis, 95616, USA.

Department of Animal Sciences, University of Wisconsin, Madison, 53706, USA.

出版信息

Genet Sel Evol. 2015 Apr 17;47(1):30. doi: 10.1186/s12711-015-0105-9.

DOI:10.1186/s12711-015-0105-9

PMID:25928167

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4404117/

Abstract

BACKGROUND

High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the data and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome.

RESULTS

Simulations applying this method were performed to identify selection signatures from pooled sequencing FST data, for which allele frequencies were estimated from a pool of individuals. The relative ratio of true to false positives was twice that generated by existing techniques. A comparison of the approach to a previous study that involved pooled sequencing FST data from maize suggested that outlying windows were more clearly separated from their neighbors than when using a standard sliding window approach.

CONCLUSIONS

We have developed a novel technique to identify window boundaries for subsequent analysis protocols. When applied to selection studies based on F ST data, this method provides a high discovery rate and minimizes false positives. The method is implemented in the R package GenWin, which is publicly available from CRAN.

摘要

背景

高密度基因组数据通常通过合并相邻标记窗口上的信息来进行分析。对按窗口分组的数据与单个位置的数据进行解释，可能会提高统计功效、简化计算、减少抽样噪声并减少所执行测试的总数。然而，使用相邻标记信息可能会导致过度平滑或平滑不足、不理想的窗口边界规格或高度相关的测试统计量。我们引入了一种基于数据中的统计引导断点来定义窗口的方法，作为分析多个相邻数据点的基础。该方法首先对数据拟合三次平滑样条，然后识别拟合样条的拐点，这些拐点用作相邻窗口的边界。此技术不需要连锁不平衡的先验知识，因此可应用于从个体或混合测序实验收集的数据。此外，与现有方法不同，无需任意选择窗口大小，因为这些窗口大小是根据经验确定的，并且允许沿基因组变化。

结果

应用此方法进行了模拟，以从混合测序FST数据中识别选择特征，其中等位基因频率是从个体池中估计的。真阳性与假阳性的相对比率是现有技术产生的两倍。将该方法与之前一项涉及玉米混合测序FST数据的研究进行比较，结果表明，与使用标准滑动窗口方法相比，异常窗口与其相邻窗口的分离更为明显。

结论

我们开发了一种新颖的技术来识别窗口边界，以供后续分析方案使用。当应用于基于FST数据的选择研究时，该方法具有较高的发现率并将假阳性降至最低。该方法在R包GenWin中实现，可从CRAN公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63e9/4404117/dd95a271db47/12711_2015_105_Fig1_HTML.jpg

相似文献

Defining window-boundaries for genomic analyses using smoothing spline techniques.使用平滑样条技术定义基因组分析的窗口边界。

Genet Sel Evol. 2015 Apr 17;47(1):30. doi: 10.1186/s12711-015-0105-9.

Combining multi-scale composite windows with hierarchical smoothing strategy for fingerprint orientation field computation.结合多尺度复合窗口与层次化平滑策略的指纹方向场计算。

Biomed Eng Online. 2018 Oct 1;17(1):136. doi: 10.1186/s12938-018-0559-4.

Construction of high-quality recombination maps with low-coverage genomic sequencing for joint linkage analysis in maize.利用低覆盖度基因组测序构建高质量重组图谱用于玉米的联合连锁分析

BMC Biol. 2015 Sep 21;13:78. doi: 10.1186/s12915-015-0187-4.

The role of smoothing techniques in the interpretation of results from genomic scans using sib-pair data.平滑技术在利用同胞对数据进行基因组扫描结果解读中的作用。

Genet Epidemiol. 1997;14(6):1047-52. doi: 10.1002/(SICI)1098-2272(1997)14:6<1047::AID-GEPI81>3.0.CO;2-H.

Accuracy of Genomic Prediction in Synthetic Populations Depending on the Number of Parents, Relatedness, and Ancestral Linkage Disequilibrium.取决于亲本数量、亲缘关系和祖先连锁不平衡的合成群体中基因组预测的准确性。

Genetics. 2017 Jan;205(1):441-454. doi: 10.1534/genetics.116.193243. Epub 2016 Nov 9.

Detailed analysis of the relative power of direct and indirect association studies and the implications for their interpretation.直接关联研究与间接关联研究相对效力的详细分析及其解读的意义。

Hum Hered. 2007;64(1):63-73. doi: 10.1159/000101424. Epub 2007 Apr 27.

Genetic architecture of the maize kernel row number revealed by combining QTL mapping using a high-density genetic map and bulked segregant RNA sequencing.利用高密度遗传图谱进行QTL定位和混合分离群体RNA测序相结合揭示玉米穗行数的遗传结构

BMC Genomics. 2016 Nov 14;17(1):915. doi: 10.1186/s12864-016-3240-y.

Short communication: The combined use of linkage disequilibrium-based haploblocks and allele frequency-based haplotype selection methods enhances genomic evaluation accuracy in dairy cattle.简短通讯：基于连锁不平衡的单倍型块与基于等位基因频率的单倍型选择方法的联合使用提高了奶牛基因组评估的准确性。

J Dairy Sci. 2017 Apr;100(4):2905-2908. doi: 10.3168/jds.2016-11798. Epub 2017 Feb 1.

Weighted spline based integration for reconstruction of freeform wavefront.基于加权样条的自由形式波前重建积分法。

Appl Opt. 2018 Feb 10;57(5):1100-1109. doi: 10.1364/AO.57.001100.

Sliding window prior data assisted compressed sensing for MRI tracking of lung tumors.用于肺部肿瘤MRI跟踪的滑动窗口先验数据辅助压缩感知

Med Phys. 2017 Jan;44(1):84-98. doi: 10.1002/mp.12027.

引用本文的文献

Bivariate GWAS performed on rabbits divergently selected for intramuscular fat content reveals pleiotropic genomic regions and genes related to meat and carcass quality traits.对肌肉脂肪含量进行差异选择的兔子进行的双变量全基因组关联研究揭示了与肉质和胴体品质性状相关的多效性基因组区域和基因。

Genet Sel Evol. 2025 Jul 11;57(1):36. doi: 10.1186/s12711-025-00971-5.

Independent origins and non-parallel selection signatures of triclabendazole resistance in Fasciola hepatica.肝片吸虫中三氯苯达唑抗性的独立起源和非平行选择特征

Nat Commun. 2025 Mar 27;16(1):2996. doi: 10.1038/s41467-025-57796-5.

Deep learning insights into distinct patterns of polygenic adaptation across human populations.深度学习对人类群体中多基因适应性的不同模式的见解。

Nucleic Acids Res. 2024 Dec 11;52(22):e102. doi: 10.1093/nar/gkae1027.

The genomic history and global migration of a windborne pest.一种随风传播害虫的基因组历史与全球迁徙

Sci Adv. 2024 Apr 26;10(17):eadk3852. doi: 10.1126/sciadv.adk3852. Epub 2024 Apr 24.

Genetic and behavioral differences between above and below ground Culex pipiens bioforms.地上和地下型库蚊生物型的遗传和行为差异。

Heredity (Edinb). 2024 May;132(5):221-231. doi: 10.1038/s41437-024-00675-4. Epub 2024 Feb 29.

Uncovering the architecture of selection in two cattle breeds.揭示两个牛品种的选择架构。

Evol Appl. 2024 Feb 22;17(2):e13666. doi: 10.1111/eva.13666. eCollection 2024 Feb.

Genome-environment association analyses reveal geographically restricted adaptive divergence across the range of the widespread Eurasian carnivore (Linnaeus, 1758).基因组-环境关联分析揭示了广泛分布的欧亚食肉动物（林奈，1758年）在其分布范围内受地理限制的适应性分化。

Evol Appl. 2023 Oct 9;16(11):1773-1788. doi: 10.1111/eva.13570. eCollection 2023 Nov.

An allele-sharing, moment-based estimator of global, population-specific and population-pair FST under a general model of population structure.基于一般群体结构模型的等位基因共享、矩估计的全球、群体特有和群体对 FST。

PLoS Genet. 2023 Nov 27;19(11):e1010871. doi: 10.1371/journal.pgen.1010871. eCollection 2023 Nov.

Genomic analysis of the rhesus macaque () and the cynomolgus macaque () uncover polygenic signatures of reinforcement speciation.恒河猴（）和食蟹猴（）的基因组分析揭示了强化物种形成的多基因特征。

Ecol Evol. 2023 Oct 15;13(10):e10571. doi: 10.1002/ece3.10571. eCollection 2023 Oct.

A GBS-based genetic linkage map and quantitative trait loci (QTL) associated with resistance to pv. race 1 identified in .基于GBS的遗传连锁图谱以及与对1号小种抗性相关的数量性状位点（QTL）在……中被鉴定出来。（注：原文中“pv. race 1”表述不完整，这里只能按字面大致翻译，完整准确的翻译需补充完整相关内容）

Front Plant Sci. 2023 Jun 13;14:1205681. doi: 10.3389/fpls.2023.1205681. eCollection 2023.

本文引用的文献

ESTIMATING F-STATISTICS FOR THE ANALYSIS OF POPULATION STRUCTURE.估计用于群体结构分析的F统计量

Evolution. 1984 Nov;38(6):1358-1370. doi: 10.1111/j.1558-5646.1984.tb05657.x.

A genome-wide scan for evidence of selection in a maize population under long-term artificial selection for ear number.对长期以来通过人工选择穗数的玉米群体进行选择证据的全基因组扫描。

Genetics. 2014 Mar;196(3):829-40. doi: 10.1534/genetics.113.160655. Epub 2013 Dec 30.

Use of locally weighted scatterplot smoothing (LOWESS) regression to study selection signatures in Piedmontese and Italian Brown cattle breeds.使用局部加权散点图平滑（LOWESS）回归研究皮埃蒙特牛和意大利褐牛品种中的选择印记。

Anim Genet. 2014 Feb;45(1):1-11. doi: 10.1111/age.12076. Epub 2013 Jul 25.

Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry.探索东亚人群中色素候选基因中阳性选择的特征。

BMC Evol Biol. 2013 Jul 12;13:150. doi: 10.1186/1471-2148-13-150.

The genomic signal of partial sweeps in Mimulus guttatus.花色矮牵牛部分清除扫荡的基因组信号。

Genome Biol Evol. 2013;5(8):1457-69. doi: 10.1093/gbe/evt100.

A high resolution genome-wide scan for significant selective sweeps: an application to pooled sequence data in laying chickens.高分辨率全基因组扫描寻找显著的选择清除：应用于产蛋鸡的 pooled sequence data。

PLoS One. 2012;7(11):e49525. doi: 10.1371/journal.pone.0049525. Epub 2012 Nov 29.

LDx: estimation of linkage disequilibrium from high-throughput pooled resequencing data.LDx：基于高通量测序数据的连锁不平衡估计。

PLoS One. 2012;7(11):e48588. doi: 10.1371/journal.pone.0048588. Epub 2012 Nov 9.

Empirical validation of pooled whole genome population re-sequencing in Drosophila melanogaster.对黑腹果蝇全基因组群体重测序进行合并的经验验证。

PLoS One. 2012;7(7):e41901. doi: 10.1371/journal.pone.0041901. Epub 2012 Jul 26.

Genome-wide footprints of pig domestication and selection revealed through massive parallel sequencing of pooled DNA.通过大规模平行测序 pooled DNA 揭示猪驯化和选择的全基因组足迹。

PLoS One. 2011 Apr 4;6(4):e14782. doi: 10.1371/journal.pone.0014782.

Population-based resequencing of experimentally evolved populations reveals the genetic basis of body size variation in Drosophila melanogaster.基于群体的实验进化群体重测序揭示了黑腹果蝇体型变化的遗传基础。

PLoS Genet. 2011 Mar;7(3):e1001336. doi: 10.1371/journal.pgen.1001336. Epub 2011 Mar 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用平滑样条技术定义基因组分析的窗口边界。

Defining window-boundaries for genomic analyses using smoothing spline techniques.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献