• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于从5061份绵羊测序数据中改进基因变异识别的计算框架。

A computational framework for improving genetic variants identification from 5,061 sheep sequencing data.

作者信息

Xie Shangqian, Isaacs Karissa, Becker Gabrielle, Murdoch Brenda M

机构信息

Department of Animal, Veterinary & Food Sciences, University of Idaho, Moscow, ID, USA.

Superior Farms, California, USA.

出版信息

J Anim Sci Biotechnol. 2023 Oct 2;14(1):127. doi: 10.1186/s40104-023-00923-3.

DOI:10.1186/s40104-023-00923-3
PMID:37779189
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10544426/
Abstract

BACKGROUND

Pan-genomics is a recently emerging strategy that can be utilized to provide a more comprehensive characterization of genetic variation. Joint calling is routinely used to combine identified variants across multiple related samples. However, the improvement of variants identification using the mutual support information from multiple samples remains quite limited for population-scale genotyping.

RESULTS

In this study, we developed a computational framework for joint calling genetic variants from 5,061 sheep by incorporating the sequencing error and optimizing mutual support information from multiple samples' data. The variants were accurately identified from multiple samples by using four steps: (1) Probabilities of variants from two widely used algorithms, GATK and Freebayes, were calculated by Poisson model incorporating base sequencing error potential; (2) The variants with high mapping quality or consistently identified from at least two samples by GATK and Freebayes were used to construct the raw high-confidence identification (rHID) variants database; (3) The high confidence variants identified in single sample were ordered by probability value and controlled by false discovery rate (FDR) using rHID database; (4) To avoid the elimination of potentially true variants from rHID database, the variants that failed FDR were reexamined to rescued potential true variants and ensured high accurate identification variants. The results indicated that the percent of concordant SNPs and Indels from Freebayes and GATK after our new method were significantly improved 12%-32% compared with raw variants and advantageously found low frequency variants of individual sheep involved several traits including nipples number (GPC5), scrapie pathology (PAPSS2), seasonal reproduction and litter size (GRM1), coat color (RAB27A), and lentivirus susceptibility (TMEM154).

CONCLUSION

The new method used the computational strategy to reduce the number of false positives, and simultaneously improve the identification of genetic variants. This strategy did not incur any extra cost by using any additional samples or sequencing data information and advantageously identified rare variants which can be important for practical applications of animal breeding.

摘要

背景

泛基因组学是一种最近出现的策略,可用于更全面地表征遗传变异。联合调用通常用于合并多个相关样本中已识别的变异。然而,对于群体规模的基因分型,利用多个样本的相互支持信息来改进变异识别仍然非常有限。

结果

在本研究中,我们通过纳入测序误差并优化来自多个样本数据的相互支持信息,开发了一个用于联合调用5061只绵羊遗传变异的计算框架。通过四个步骤从多个样本中准确识别变异:(1)通过纳入碱基测序误差潜力的泊松模型计算两种广泛使用的算法GATK和Freebayes的变异概率;(2)使用GATK和Freebayes从至少两个样本中一致识别出的具有高映射质量的变异用于构建原始高置信度识别(rHID)变异数据库;(3)使用rHID数据库按概率值对单样本中识别出的高置信度变异进行排序并通过错误发现率(FDR)进行控制;(4)为避免从rHID数据库中消除潜在的真实变异,对未通过FDR的变异进行重新检查以挽救潜在的真实变异并确保高准确识别变异。结果表明,与原始变异相比,我们的新方法处理后Freebayes和GATK的一致SNP和Indel百分比显著提高了12%-32%,并且有利地发现了涉及多个性状的个体绵羊的低频变异,包括乳头数量(GPC5)、羊瘙痒病病理学(PAPSS2)、季节性繁殖和产仔数(GRM1)、毛色(RAB27A)以及慢病毒易感性(TMEM154)。

结论

新方法采用计算策略减少假阳性数量,同时提高遗传变异的识别。该策略无需使用任何额外样本或测序数据信息即可产生任何额外成本,并且有利地识别出对动物育种实际应用可能很重要的罕见变异。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b04/10544426/aad5a427689b/40104_2023_923_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b04/10544426/1a775f82ad20/40104_2023_923_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b04/10544426/3339583313eb/40104_2023_923_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b04/10544426/401445647246/40104_2023_923_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b04/10544426/a340146c834a/40104_2023_923_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b04/10544426/aad5a427689b/40104_2023_923_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b04/10544426/1a775f82ad20/40104_2023_923_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b04/10544426/3339583313eb/40104_2023_923_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b04/10544426/401445647246/40104_2023_923_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b04/10544426/a340146c834a/40104_2023_923_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b04/10544426/aad5a427689b/40104_2023_923_Fig5_HTML.jpg

相似文献

1
A computational framework for improving genetic variants identification from 5,061 sheep sequencing data.一种用于从5061份绵羊测序数据中改进基因变异识别的计算框架。
J Anim Sci Biotechnol. 2023 Oct 2;14(1):127. doi: 10.1186/s40104-023-00923-3.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Identification of missing variants by combining multiple analytic pipelines.通过结合多个分析管道识别缺失的变异。
BMC Bioinformatics. 2018 Apr 16;19(1):139. doi: 10.1186/s12859-018-2151-0.
4
The GATK joint genotyping workflow is appropriate for calling variants in RNA-seq experiments.GATK联合基因分型工作流程适用于在RNA测序实验中检测变异。
J Anim Sci Biotechnol. 2019 Jun 21;10:44. doi: 10.1186/s40104-019-0359-0. eCollection 2019.
5
Detailed comparison of two popular variant calling packages for exome and targeted exon studies.详细比较两种用于外显子组和靶向外显子研究的流行变异调用包。
PeerJ. 2014 Sep 30;2:e600. doi: 10.7717/peerj.600. eCollection 2014.
6
Evaluation of SNP calling using single and multiple-sample calling algorithms by validation against array base genotyping and Mendelian inheritance.通过与基于芯片的基因分型和孟德尔遗传进行验证,评估使用单样本和多样本调用算法进行单核苷酸多态性(SNP)调用的情况。
BMC Res Notes. 2014 Oct 22;7:747. doi: 10.1186/1756-0500-7-747.
7
Using whole genome sequence to compare variant callers and breed differences of US sheep.利用全基因组序列比较美国绵羊的变异检测工具及品种差异。
Front Genet. 2023 Jan 4;13:1060882. doi: 10.3389/fgene.2022.1060882. eCollection 2022.
8
Calling known variants and identifying new variants while rapidly aligning sequence data.在快速对齐序列数据的同时,调用已知变异体并识别新变异体。
J Dairy Sci. 2019 Apr;102(4):3216-3229. doi: 10.3168/jds.2018-15172. Epub 2019 Feb 14.
9
Huvariome: a web server resource of whole genome next-generation sequencing allelic frequencies to aid in pathological candidate gene selection.Huvariome:一个用于辅助病理候选基因选择的全基因组下一代测序等位基因频率的网络服务器资源。
J Clin Bioinforma. 2012 Nov 19;2(1):19. doi: 10.1186/2043-9113-2-19.
10
Comparison of Multi-Sample Variant Calling Methods for Whole Genome Sequencing.全基因组测序的多样本变异检测方法比较
IEEE Int Conf Systems Biol. 2014 Oct;2014:59-62. doi: 10.1109/ISB.2014.6990432.

引用本文的文献

1
Genome-Wide Association Study of Body Weight Traits in Texel and Kazakh Crossbred Sheep.特克塞尔羊与哈萨克杂交羊体重性状的全基因组关联研究
Genes (Basel). 2024 Nov 27;15(12):1521. doi: 10.3390/genes15121521.
2
Searching for homozygous haplotype deficiency in Manech Tête Rousse dairy sheep revealed a nonsense variant in the MMUT gene affecting newborn lamb viability.在寻找马内奇·泰特·鲁塞(Manech Tête Rousse)奶绵羊的纯合单倍型缺乏症时,发现 MMUT 基因中的一个无义变异影响新生羔羊的存活能力。
Genet Sel Evol. 2024 Feb 29;56(1):16. doi: 10.1186/s12711-024-00886-7.

本文引用的文献

1
Increased Frequency of Indels in Hypervariable Regions of SARS-CoV-2 Proteins-A Possible Signature of Adaptive Selection.严重急性呼吸综合征冠状病毒2(SARS-CoV-2)蛋白高变区插入缺失频率增加——适应性选择的一种可能特征
Front Genet. 2022 Jun 2;13:875406. doi: 10.3389/fgene.2022.875406. eCollection 2022.
2
Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes.基于泛基因组的基因组推断可在广泛的变异类别中实现高效、准确的基因分型。
Nat Genet. 2022 Apr;54(4):518-525. doi: 10.1038/s41588-022-01043-w. Epub 2022 Apr 11.
3
Variants Within Genes and are Associated With Divergent Fecal Egg Counts in Katahdin Sheep at Weaning.
基因内的变异与断奶时卡他丁绵羊粪便虫卵计数的差异有关。
Front Genet. 2022 Mar 10;13:817319. doi: 10.3389/fgene.2022.817319. eCollection 2022.
4
A complete reference genome improves analysis of human genetic variation.完整的参考基因组提高了人类遗传变异分析的能力。
Science. 2022 Apr;376(6588):eabl3533. doi: 10.1126/science.abl3533. Epub 2022 Apr 1.
5
Genome-wide analysis provides genetic evidence that ACE2 influences COVID-19 risk and yields risk scores associated with severe disease.全基因组分析提供了遗传证据,表明 ACE2 影响 COVID-19 风险,并产生与严重疾病相关的风险评分。
Nat Genet. 2022 Apr;54(4):382-392. doi: 10.1038/s41588-021-01006-7. Epub 2022 Mar 3.
6
An improved ovine reference genome assembly to facilitate in-depth functional annotation of the sheep genome.一个改良的绵羊参考基因组组装,以促进绵羊基因组的深入功能注释。
Gigascience. 2022 Feb 4;11. doi: 10.1093/gigascience/giab096.
7
Polymorphisms and association of GRM1, GNAQ and HCRTR1 genes with seasonal reproduction and litter size in three sheep breeds.GRM1、GNAQ和HCRTR1基因多态性及其与三个绵羊品种季节性繁殖和产羔数的关联
Reprod Domest Anim. 2022 May;57(5):532-540. doi: 10.1111/rda.14091. Epub 2022 Feb 8.
8
The Agricultural Genome to Phenome Initiative (AG2PI): creating a shared vision across crop and livestock research communities.农业基因组到表型组计划(AG2PI):在作物和畜牧研究社区建立共同愿景。
Genome Biol. 2022 Jan 3;23(1):3. doi: 10.1186/s13059-021-02570-1.
9
Pangenomics enables genotyping of known structural variants in 5202 diverse genomes.泛基因组学能够对 5202 个不同基因组中的已知结构变异进行基因分型。
Science. 2021 Dec 17;374(6574):abg8871. doi: 10.1126/science.abg8871.
10
NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks.利用基于单倍型感知的深度神经网络,从长读测序中对难以映射区域中的 SNPs 和 indels 进行精确检测的 NanoCaller。
Genome Biol. 2021 Sep 6;22(1):261. doi: 10.1186/s13059-021-02472-2.