猪参考基因组（Sscrofa10.2）中低置信度区域的鉴定。

Identification of Low-Confidence Regions in the Pig Reference Genome (Sscrofa10.2).

作者信息

Warr Amanda, Robert Christelle, Hume David, Archibald Alan L, Deeb Nader, Watson Mick

机构信息

Division of Genetics and Genomics, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh Edinburgh, UK.

Genus plc., Hendersonville TN, USA.

出版信息

Front Genet. 2015 Nov 27;6:338. doi: 10.3389/fgene.2015.00338. eCollection 2015.

DOI:10.3389/fgene.2015.00338

PMID:26640477

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4662242/

Abstract

Many applications of high throughput sequencing rely on the availability of an accurate reference genome. Variant calling often produces large data sets that cannot be realistically validated and which may contain large numbers of false-positives. Errors in the reference assembly increase the number of false-positives. While resources are available to aid in the filtering of variants from human data, for other species these do not yet exist and strict filtering techniques must be employed which are more likely to exclude true-positives. This work assesses the accuracy of the pig reference genome (Sscrofa10.2) using whole genome sequencing reads from the Duroc sow whose genome the assembly was based on. Indicators of structural variation including high regional coverage, unexpected insert sizes, improper pairing and homozygous variants were used to identify low quality (LQ) regions of the assembly. Low coverage (LC) regions were also identified and analyzed separately. The LQ regions covered 13.85% of the genome, the LC regions covered 26.6% of the genome and combined (LQLC) they covered 33.07% of the genome. Over half of dbSNP variants were located in the LQLC regions. Of copy number variable regions identified in a previous study, 86.3% were located in the LQLC regions. The regions were also enriched for gene predictions from RNA-seq data with 42.98% falling in the LQLC regions. Excluding variants in the LQ, LC, or LQLC from future analyses will help reduce the number of false-positive variant calls. Researchers using WGS data should be aware that the current pig reference genome does not give an accurate representation of the copy number of alleles in the original Duroc sow's genome.

摘要

高通量测序的许多应用都依赖于准确的参考基因组。变异检测通常会产生大量数据集，这些数据集难以实际验证，并且可能包含大量假阳性。参考组装中的错误会增加假阳性的数量。虽然有资源可用于帮助从人类数据中过滤变异，但对于其他物种，这些资源尚不存在，因此必须采用严格的过滤技术，而这些技术更有可能排除真阳性。这项工作使用来自杜洛克母猪的全基因组测序读数评估猪参考基因组（Sscrofa10.2）的准确性，该基因组组装基于该母猪的基因组。包括高区域覆盖率、意外插入大小、配对不当和纯合变异在内的结构变异指标被用于识别组装的低质量（LQ）区域。低覆盖率（LC）区域也被识别并单独分析。LQ区域覆盖了基因组的13.85%，LC区域覆盖了基因组的26.6%，两者合并（LQLC）覆盖了基因组的33.07%。超过一半的dbSNP变异位于LQLC区域。在先前研究中确定的拷贝数可变区域中，86.3%位于LQLC区域。这些区域也富含来自RNA-seq数据的基因预测，其中42.98%位于LQLC区域。在未来的分析中排除LQ、LC或LQLC区域中的变异将有助于减少假阳性变异检测的数量。使用全基因组测序数据的研究人员应意识到，当前的猪参考基因组并不能准确反映原始杜洛克母猪基因组中等位基因的拷贝数。

相似文献

Identification of Low-Confidence Regions in the Pig Reference Genome (Sscrofa10.2).猪参考基因组（Sscrofa10.2）中低置信度区域的鉴定。

Front Genet. 2015 Nov 27;6:338. doi: 10.3389/fgene.2015.00338. eCollection 2015.

Positional bias in variant calls against draft reference assemblies.针对草图参考基因组组装的变异位点调用中的位置偏差。

BMC Genomics. 2017 Mar 28;18(1):263. doi: 10.1186/s12864-017-3637-2.

Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection.Huvariome：一个用于辅助病理候选基因选择的全基因组下一代测序等位基因频率的网络服务器资源。

J Clin Bioinforma. 2012 Nov 19;2(1):19. doi: 10.1186/2043-9113-2-19.

Alternate-locus aware variant calling in whole genome sequencing.全基因组测序中位点交替感知变异检测

Genome Med. 2016 Dec 13;8(1):130. doi: 10.1186/s13073-016-0383-z.

Lessons for livestock genomics from genome and transcriptome sequencing in cattle and other mammals.牛及其他哺乳动物的基因组和转录组测序为家畜基因组学带来的启示。

Genet Sel Evol. 2016 Aug 17;48(1):59. doi: 10.1186/s12711-016-0237-6.

misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads.misFinder：使用参考序列和双端读段以无偏倚的方式识别错误组装。

BMC Bioinformatics. 2015 Nov 16;16:386. doi: 10.1186/s12859-015-0818-3.

tarSVM: Improving the accuracy of variant calls derived from microfluidic PCR-based targeted next generation sequencing using a support vector machine.tarSVM：使用支持向量机提高基于微流控PCR的靶向新一代测序得出的变异检测准确性。

BMC Bioinformatics. 2016 Jun 10;17(1):233. doi: 10.1186/s12859-016-1108-4.

Medical implications of technical accuracy in genome sequencing.基因组测序技术准确性的医学意义。

Genome Med. 2016 Mar 2;8(1):24. doi: 10.1186/s13073-016-0269-0.

Optimal sequencing depth design for whole genome re-sequencing in pigs.猪全基因组重测序的最佳测序深度设计。

BMC Bioinformatics. 2019 Nov 8;20(1):556. doi: 10.1186/s12859-019-3164-z.

Comparison of the Equine Reference Sequence with Its Sanger Source Data and New Illumina Reads.马参考序列与其桑格测序源数据及新的Illumina测序读段的比较。

PLoS One. 2015 Jun 24;10(6):e0126852. doi: 10.1371/journal.pone.0126852. eCollection 2015.

引用本文的文献

Development of a whole-exome sequencing kit to facilitate porcine biomedical research.开发一种全外显子测序试剂盒以促进猪的生物医学研究。

Genome Biol. 2025 May 8;26(1):118. doi: 10.1186/s13059-025-03589-4.

Genome-wide detection of CNV regions and their potential association with growth and fatness traits in Duroc pigs.杜洛克猪全基因组CNV区域检测及其与生长和脂肪性状的潜在关联

BMC Genomics. 2021 May 8;22(1):332. doi: 10.1186/s12864-021-07654-7.

A chromosome-level genome assembly for the Pacific oyster Crassostrea gigas.太平洋牡蛎 Crassostrea gigas 的染色体水平基因组组装。

Gigascience. 2021 Mar 25;10(3). doi: 10.1093/gigascience/giab020.

An improved pig reference genome sequence to enable pig genetics and genomics research.一个改良的猪参考基因组序列，以支持猪的遗传学和基因组学研究。

Gigascience. 2020 Jun 1;9(6). doi: 10.1093/gigascience/giaa051.

CNV analysis of Meishan pig by next-generation sequencing and effects of gene CNV on pig reproductive traits.利用新一代测序技术对梅山猪进行拷贝数变异分析及其基因拷贝数变异对猪繁殖性状的影响

J Anim Sci Biotechnol. 2020 Apr 21;11:42. doi: 10.1186/s40104-020-00442-5. eCollection 2020.

Whole genome SNPs discovery in Nero Siciliano pig.在西西里黑猪中进行全基因组单核苷酸多态性（SNP）发现

Genet Mol Biol. 2019 Jul-Sep;42(3):594-602. doi: 10.1590/1678-4685-GMB-2018-0169. Epub 2019 Nov 14.

CNVcaller: highly efficient and widely applicable software for detecting copy number variations in large populations.CNVcaller：一款高效、广泛适用的软件，可用于检测大群体中的拷贝数变异。

Gigascience. 2017 Dec 1;6(12):1-12. doi: 10.1093/gigascience/gix115.

RNA sequencing reveals candidate genes and polymorphisms related to sperm DNA integrity in testis tissue from boars.RNA测序揭示了与公猪睾丸组织中精子DNA完整性相关的候选基因和多态性。

BMC Vet Res. 2017 Nov 28;13(1):362. doi: 10.1186/s12917-017-1279-x.

Genome-wide analysis of structural variants reveals genetic differences in Chinese pigs.全基因组结构变异分析揭示中国猪种的遗传差异。

PLoS One. 2017 Oct 24;12(10):e0186721. doi: 10.1371/journal.pone.0186721. eCollection 2017.

Rapid Increase in Genome Size as a Consequence of Transposable Element Hyperactivity in Wood-White (Leptidea) Butterflies.转座元件的过度活跃导致木白蝶（Leptidea）基因组大小的快速增加。

Genome Biol Evol. 2017 Oct 1;9(10):2491-2505. doi: 10.1093/gbe/evx163.

本文引用的文献

Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants.在检测外显子变异方面，全基因组测序比全外显子测序更强大。

Proc Natl Acad Sci U S A. 2015 Apr 28;112(17):5473-8. doi: 10.1073/pnas.1418631112. Epub 2015 Mar 31.

Large-scale whole-genome sequencing of the Icelandic population.大规模全基因组测序的冰岛人口。

Nat Genet. 2015 May;47(5):435-44. doi: 10.1038/ng.3247. Epub 2015 Mar 25.

A deep catalog of autosomal single nucleotide variation in the pig.猪常染色体单核苷酸变异的深度目录。

PLoS One. 2015 Mar 19;10(3):e0118867. doi: 10.1371/journal.pone.0118867. eCollection 2015.

Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing.通过全基因组测序鉴定的猪的适应和可能的古老种间渗入。

Nat Genet. 2015 Mar;47(3):217-25. doi: 10.1038/ng.3199. Epub 2015 Jan 26.

Ensembl 2015.Ensembl 2015.

Nucleic Acids Res. 2015 Jan;43(Database issue):D662-9. doi: 10.1093/nar/gku1010. Epub 2014 Oct 28.

Design and development of exome capture sequencing for the domestic pig (Sus scrofa).家猪（Sus scrofa）外显子捕获测序的设计与开发。

BMC Genomics. 2014 Jul 3;15(1):550. doi: 10.1186/1471-2164-15-550.

Assembly errors cause false tandem duplicate regions in the chicken (Gallus gallus) genome sequence.

Chromosoma. 2014 Mar;123(1-2):165-8. doi: 10.1007/s00412-013-0443-8. Epub 2013 Nov 10.

A comparative analysis of algorithms for somatic SNV detection in cancer.癌症体细胞单核苷酸变异检测算法的比较分析。

Bioinformatics. 2013 Sep 15;29(18):2223-30. doi: 10.1093/bioinformatics/btt375. Epub 2013 Jul 9.

Evolutionary dynamics of copy number variation in pig genomes in the context of adaptation and domestication.猪基因组中与适应和驯化相关的拷贝数变异的进化动态。

BMC Genomics. 2013 Jul 5;14:449. doi: 10.1186/1471-2164-14-449.

A comprehensive survey of copy number variation in 18 diverse pig populations and identification of candidate copy number variable genes associated with complex traits.对 18 个不同猪种群的拷贝数变异进行全面调查，并鉴定与复杂性状相关的候选拷贝数可变基因。

BMC Genomics. 2012 Dec 27;13:733. doi: 10.1186/1471-2164-13-733.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

猪参考基因组（Sscrofa10.2）中低置信度区域的鉴定。

Identification of Low-Confidence Regions in the Pig Reference Genome (Sscrofa10.2).

作者信息

Warr Amanda, Robert Christelle, Hume David, Archibald Alan L, Deeb Nader, Watson Mick

机构信息

Division of Genetics and Genomics, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh Edinburgh, UK.

Genus plc., Hendersonville TN, USA.

出版信息

Front Genet. 2015 Nov 27;6:338. doi: 10.3389/fgene.2015.00338. eCollection 2015.

DOI:10.3389/fgene.2015.00338

PMID:26640477

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4662242/

Abstract

摘要

猪参考基因组（Sscrofa10.2）中低置信度区域的鉴定。

Identification of Low-Confidence Regions in the Pig Reference Genome (Sscrofa10.2).

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

猪参考基因组（Sscrofa10.2）中低置信度区域的鉴定。

Identification of Low-Confidence Regions in the Pig Reference Genome (Sscrofa10.2).

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献