• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种通过整合来自密集基因型面板的家族信息来改善全基因组测序个体相位的策略。

A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels.

作者信息

Faux Pierre, Druet Tom

机构信息

Unit of Animal Genomics, GIGA-R and Faculty of Veterinary Medicine, University of Liège, 4000, Liège, Belgium.

出版信息

Genet Sel Evol. 2017 May 16;49(1):46. doi: 10.1186/s12711-017-0321-6.

DOI:10.1186/s12711-017-0321-6
PMID:28511677
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5434521/
Abstract

BACKGROUND

Haplotype reconstruction (phasing) is an essential step in many applications, including imputation and genomic selection. The best phasing methods rely on both familial and linkage disequilibrium (LD) information. With whole-genome sequence (WGS) data, relatively small samples of reference individuals are generally sequenced due to prohibitive sequencing costs, thus only a limited amount of familial information is available. However, reference individuals have many relatives that have been genotyped (at lower density). The goal of our study was to improve phasing of WGS data by integrating familial information from haplotypes that were obtained from a larger genotyped dataset and to quantify its impact on imputation accuracy.

RESULTS

Aligning a pre-phased WGS panel [5 million single nucleotide polymorphisms (SNPs)], which is based on LD information only, to a 50k SNP array that is phased with both LD and familial information (called scaffold) resulted in correctly assigning parental origin for 99.62% of the WGS SNPs, their phase being determined unambiguously based on parental genotypes. Without using the 50k haplotypes as scaffold, that value dropped as expected to 50%. Correctly phased segments were on average longer after alignment to the genotype phase while the number of switches decreased slightly. Most of the incorrectly assigned segments, and subsequent switches, were due to singleton errors. Imputation from 50k SNP array to WGS data with improved phasing had a marginal impact on imputation accuracy (measured as r ), i.e. on average, 90.47% with traditional techniques versus 90.65% with pre-phasing integrating familial information. Differences were larger for SNPs located in chromosome ends and rare variants. Using a denser WGS panel (13 millions SNPs) that was obtained with traditional variant filtering rules, we found similar results although performances of both phasing and imputation accuracy were lower.

CONCLUSIONS

We present a phasing strategy for WGS data, which indirectly integrates familial information by aligning WGS haplotypes that are pre-phased with LD information only on haplotypes obtained with genotyping data, with both LD and familial information and on a much larger population. This strategy results in very few mismatches with the phase obtained by Mendelian segregation rules. Finally, we propose a strategy to further improve phasing accuracy based on haplotype clusters obtained with genotyping data.

摘要

背景

单倍型重建(定相)是许多应用中的关键步骤,包括基因填充和基因组选择。最佳的定相方法依赖家系信息和连锁不平衡(LD)信息。对于全基因组序列(WGS)数据,由于测序成本过高,通常仅对相对少量的参考个体进行测序,因此可获得的家系信息有限。然而,参考个体有许多已进行基因分型(低密度)的亲属。我们研究的目的是通过整合来自更大基因分型数据集的单倍型家系信息来改善WGS数据的定相,并量化其对基因填充准确性的影响。

结果

将仅基于LD信息预定相的WGS面板[约500万个单核苷酸多态性(SNP)]与通过LD和家系信息定相的50k SNP阵列(称为支架)进行比对,结果显示99.62%的WGS SNP能够正确确定亲本来源,其相位根据亲本基因型明确确定。若不使用50k单倍型作为支架,该值如预期降至50%。与基因型相位比对后,正确定相的片段平均更长,而切换次数略有减少。大多数错误分配的片段及随后的切换是由于单例错误。从50k SNP阵列到经改进定相的WGS数据的基因填充对填充准确性(以r衡量)的影响很小,即传统技术平均为90.47%,而预定相整合家系信息时为90.65%。位于染色体末端的SNP和罕见变异的差异更大。使用通过传统变异过滤规则获得的密度更高的WGS面板(约1300万个SNP),我们发现了类似结果,尽管定相和填充准确性的表现均较低。

结论

我们提出了一种针对WGS数据的定相策略,该策略通过将仅基于LD信息预定相的WGS单倍型与通过基因分型数据获得的、同时包含LD和家系信息且样本量更大的单倍型进行比对,间接整合家系信息。此策略与通过孟德尔分离规则获得的相位的不匹配极少。最后,我们提出了一种基于通过基因分型数据获得的单倍型簇进一步提高定相准确性的策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b55a/5434521/64e1fff61054/12711_2017_321_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b55a/5434521/46596872f971/12711_2017_321_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b55a/5434521/9c51b745a2ee/12711_2017_321_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b55a/5434521/64e1fff61054/12711_2017_321_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b55a/5434521/46596872f971/12711_2017_321_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b55a/5434521/9c51b745a2ee/12711_2017_321_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b55a/5434521/64e1fff61054/12711_2017_321_Fig3_HTML.jpg

相似文献

1
A strategy to improve phasing of whole-genome sequenced individuals through integration of familial information from dense genotype panels.一种通过整合来自密集基因型面板的家族信息来改善全基因组测序个体相位的策略。
Genet Sel Evol. 2017 May 16;49(1):46. doi: 10.1186/s12711-017-0321-6.
2
Whole-genome characterization in pedigreed non-human primates using genotyping-by-sequencing (GBS) and imputation.利用简化基因组测序(GBS)和填充技术对圈养非人灵长类动物进行全基因组特征分析。
BMC Genomics. 2016 Aug 24;17(1):676. doi: 10.1186/s12864-016-2966-x.
3
Accuracy of genotype imputation in sheep breeds.绵羊品种基因型推断的准确性。
Anim Genet. 2012 Feb;43(1):72-80. doi: 10.1111/j.1365-2052.2011.02208.x. Epub 2011 May 27.
4
Strategies for imputation to whole genome sequence using a single or multi-breed reference population in cattle.利用单一或多品种参考群体对牛全基因组序列进行填充的策略。
BMC Genomics. 2014 Aug 27;15(1):728. doi: 10.1186/1471-2164-15-728.
5
Recombination locations and rates in beef cattle assessed from parent-offspring pairs.通过亲子对评估肉牛的重组位置和速率。
Genet Sel Evol. 2014 May 29;46(1):34. doi: 10.1186/1297-9686-46-34.
6
A method for the allocation of sequencing resources in genotyped livestock populations.一种在基因分型家畜群体中分配测序资源的方法。
Genet Sel Evol. 2017 May 18;49(1):47. doi: 10.1186/s12711-017-0322-5.
7
Phasing quality assessment in a brown layer population through family- and population-based software.通过基于家系和群体的软件对棕色层群体进行分相质量评估。
BMC Genet. 2019 Jul 17;20(1):57. doi: 10.1186/s12863-019-0759-3.
8
The impact of reducing the frequency of animals genotyped at higher density on imputation and prediction accuracies using ssGBLUP1.降低使用 ssGBLUP1 在更高密度下对动物进行基因型检测的频率对估计和预测准确性的影响。
J Anim Sci. 2019 Jul 2;97(7):2780-2792. doi: 10.1093/jas/skz147.
9
Benchmarking phasing software with a whole-genome sequenced cattle pedigree.利用全基因组测序的牛系谱对相位软件进行基准测试。
BMC Genomics. 2022 Feb 15;23(1):130. doi: 10.1186/s12864-022-08354-6.
10
Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle.荷斯坦奶牛全基因组序列数据插补的准确性
Genet Sel Evol. 2014 Jul 15;46(1):41. doi: 10.1186/1297-9686-46-41.

引用本文的文献

1
Neanderthal introgression in SCN9A impacts mechanical pain sensitivity.尼安德特人基因渗入SCN9A影响机械性疼痛敏感性。
Commun Biol. 2023 Oct 10;6(1):958. doi: 10.1038/s42003-023-05286-z.
2
Benchmarking phasing software with a whole-genome sequenced cattle pedigree.利用全基因组测序的牛系谱对相位软件进行基准测试。
BMC Genomics. 2022 Feb 15;23(1):130. doi: 10.1186/s12864-022-08354-6.
3
A Random Forests Framework for Modeling Haplotypes as Mosaics of Reference Haplotypes.一种将单倍型建模为参考单倍型镶嵌体的随机森林框架。

本文引用的文献

1
A comparison of different algorithms for phasing haplotypes using Holstein cattle genotypes and pedigree data.使用荷斯坦奶牛基因型和系谱数据对不同单倍型定相算法的比较。
J Dairy Sci. 2017 Apr;100(4):2837-2849. doi: 10.3168/jds.2016-11590. Epub 2017 Feb 1.
2
NGS-based reverse genetic screen for common embryonic lethal mutations compromising fertility in livestock.基于二代测序的反向遗传筛选,用于检测影响家畜繁殖力的常见胚胎致死突变。
Genome Res. 2016 Oct;26(10):1333-1341. doi: 10.1101/gr.207076.116. Epub 2016 Sep 19.
3
Coding and noncoding variants in HFM1, MLH3, MSH4, MSH5, RNF212, and RNF212B affect recombination rate in cattle.
Front Genet. 2019 Jun 27;10:562. doi: 10.3389/fgene.2019.00562. eCollection 2019.
4
Age-based partitioning of individual genomic inbreeding levels in Belgian Blue cattle.基于年龄的比利时蓝牛个体基因组近交水平划分。
Genet Sel Evol. 2017 Dec 22;49(1):92. doi: 10.1186/s12711-017-0370-x.
HFM1、MLH3、MSH4、MSH5、RNF212和RNF212B中的编码和非编码变异影响牛的重组率。
Genome Res. 2016 Oct;26(10):1323-1332. doi: 10.1101/gr.204214.116. Epub 2016 Aug 11.
4
Empirical determination of breed-of-origin of alleles in three-breed cross pigs.三品种杂交猪等位基因起源品种的实证测定
Genet Sel Evol. 2016 Aug 4;48(1):55. doi: 10.1186/s12711-016-0234-9.
5
Rapid genotype imputation from sequence without reference panels.无需参考面板即可从序列中快速进行基因型推算。
Nat Genet. 2016 Aug;48(8):965-969. doi: 10.1038/ng.3594. Epub 2016 Jul 4.
6
Extensive variation between tissues in allele specific expression in an outbred mammal.远交哺乳动物中各组织间等位基因特异性表达存在广泛差异。
BMC Genomics. 2015 Nov 23;16:993. doi: 10.1186/s12864-015-2174-0.
7
Selection of haplotype variables from a high-density marker map for genomic prediction.从高密度标记图谱中选择单倍型变量用于基因组预测。
Genet Sel Evol. 2015 Aug 1;47(1):61. doi: 10.1186/s12711-015-0143-3.
8
Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel.整合序列和阵列数据以创建改进的千人基因组计划单倍型参考面板。
Nat Commun. 2014 Jun 13;5:3934. doi: 10.1038/ncomms4934.
9
LINKPHASE3: an improved pedigree-based phasing algorithm robust to genotyping and map errors.LINKPHASE3:一种改进的基于家系的相位算法,可稳健应对基因分型和图谱错误。
Bioinformatics. 2015 May 15;31(10):1677-9. doi: 10.1093/bioinformatics/btu859. Epub 2015 Jan 7.
10
Genomic prediction of genetic merit using LD-based haplotypes in the Nordic Holstein population.在北欧荷斯坦牛群体中使用基于连锁不平衡的单倍型对遗传价值进行基因组预测。
BMC Genomics. 2014 Dec 23;15(1):1171. doi: 10.1186/1471-2164-15-1171.