• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GC 相:一种使用图划分和错误纠正算法的 SNP 相位方法。

GCphase: an SNP phasing method using a graph partition and error correction algorithm.

机构信息

School of Software, Henan Polytechnic University, Jiaozuo, 454003, China.

出版信息

BMC Bioinformatics. 2024 Aug 19;25(1):267. doi: 10.1186/s12859-024-05901-8.

DOI:10.1186/s12859-024-05901-8
PMID:39160480
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11331634/
Abstract

BACKGROUND

The utilization of long reads for single nucleotide polymorphism (SNP) phasing has become popular, providing substantial support for research on human diseases and genetic studies in animals and plants. However, due to the complexity of the linkage relationships between SNP loci and sequencing errors in the reads, the recent methods still cannot yield satisfactory results.

RESULTS

In this study, we present a graph-based algorithm, GCphase, which utilizes the minimum cut algorithm to perform phasing. First, based on alignment between long reads and the reference genome, GCphase filters out ambiguous SNP sites and useless read information. Second, GCphase constructs a graph in which a vertex represents alleles of an SNP locus and each edge represents the presence of read support; moreover, GCphase adopts a graph minimum-cut algorithm to phase the SNPs. Next, GCpahse uses two error correction steps to refine the phasing results obtained from the previous step, effectively reducing the error rate. Finally, GCphase obtains the phase block. GCphase was compared to three other methods, WhatsHap, HapCUT2, and LongPhase, on the Nanopore and PacBio long-read datasets. The code is available from https://github.com/baimawjy/GCphase .

CONCLUSIONS

Experimental results show that GCphase under different sequencing depths of different data has the least number of switch errors and the highest accuracy compared with other methods.

摘要

背景

长读在单核苷酸多态性 (SNP) 相位分析中的应用已经变得流行起来,为人类疾病研究和动植物遗传研究提供了重要支持。然而,由于 SNP 位点之间的连锁关系复杂以及读段中的测序错误,最近的方法仍然无法得到令人满意的结果。

结果

在本研究中,我们提出了一种基于图的算法 GCphase,该算法利用最小割算法进行相位分析。首先,基于长读段与参考基因组的比对,GCphase 过滤掉了模棱两可的 SNP 位点和无用的读段信息。其次,GCphase 构建了一个图,其中一个顶点代表 SNP 位点的等位基因,每条边代表读段的支持存在;此外,GCphase 采用图最小割算法对 SNP 进行相位分析。接下来,GCpahse 使用两个纠错步骤来细化前一步骤得到的相位结果,有效降低了错误率。最后,GCphase 得到了相位块。在 Nanopore 和 PacBio 长读数据集上,我们将 GCphase 与其他三种方法(WhatsHap、HapCUT2 和 LongPhase)进行了比较。该代码可从 https://github.com/baimawjy/GCphase 获得。

结论

实验结果表明,与其他方法相比,GCphase 在不同测序深度的不同数据下具有最少的转换错误和最高的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d49/11331634/59aca7356e47/12859_2024_5901_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d49/11331634/1aef32ae3b8d/12859_2024_5901_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d49/11331634/463506e1fbf0/12859_2024_5901_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d49/11331634/59aca7356e47/12859_2024_5901_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d49/11331634/1aef32ae3b8d/12859_2024_5901_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d49/11331634/463506e1fbf0/12859_2024_5901_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d49/11331634/59aca7356e47/12859_2024_5901_Fig3_HTML.jpg

相似文献

1
GCphase: an SNP phasing method using a graph partition and error correction algorithm.GC 相:一种使用图划分和错误纠正算法的 SNP 相位方法。
BMC Bioinformatics. 2024 Aug 19;25(1):267. doi: 10.1186/s12859-024-05901-8.
2
DCHap: A Divide-and-Conquer Haplotype Phasing Algorithm for Third-Generation Sequences.DCHap:一种用于第三代测序的分治单倍型相位算法。
IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1277-1284. doi: 10.1109/TCBB.2020.3005673. Epub 2022 Jun 3.
3
HapCUT2: A Method for Phasing Genomes Using Experimental Sequence Data.HapCUT2:一种使用实验序列数据进行基因组相位分析的方法。
Methods Mol Biol. 2023;2590:139-147. doi: 10.1007/978-1-0716-2819-5_9.
4
A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。
BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.
5
LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants.LongPhase:一种用于小变异和大变异的超快速染色体规模定相算法。
Bioinformatics. 2022 Mar 28;38(7):1816-1822. doi: 10.1093/bioinformatics/btac058.
6
HaploMaker: An improved algorithm for rapid haplotype assembly of genomic sequences.HaploMaker:一种用于快速组装基因组序列单倍型的改进算法。
Gigascience. 2022 May 17;11. doi: 10.1093/gigascience/giac038.
7
WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads.WhatsHap:用于下一代测序读数的加权单倍型组装
J Comput Biol. 2015 Jun;22(6):498-509. doi: 10.1089/cmb.2014.0157. Epub 2015 Feb 6.
8
Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes.基于读取和基于群体的相位整合,实现个体基因组的密集和精确单倍型分型。
Bioinformatics. 2019 Jul 15;35(14):i242-i248. doi: 10.1093/bioinformatics/btz329.
9
HALC: High throughput algorithm for long read error correction.HALC:用于长读长纠错的高通量算法。
BMC Bioinformatics. 2017 Apr 5;18(1):204. doi: 10.1186/s12859-017-1610-3.
10
Fast and SNP-aware short read alignment with SALT.基于 SALT 的快速 SNP 感知短读序列比对。
BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):172. doi: 10.1186/s12859-021-04088-6.

引用本文的文献

1
Genomic resources, opportunities, and prospects for accelerated improvement of millets.小米的基因组资源、机遇和加速改良的前景。
Theor Appl Genet. 2024 Nov 20;137(12):273. doi: 10.1007/s00122-024-04777-9.

本文引用的文献

1
Towards accurate, contiguous and complete alignment-based polyploid phasing algorithms.迈向基于比对的准确、连续且完整的多倍体定相算法。
Genomics. 2022 May;114(3):110369. doi: 10.1016/j.ygeno.2022.110369. Epub 2022 Apr 26.
2
LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants.LongPhase:一种用于小变异和大变异的超快速染色体规模定相算法。
Bioinformatics. 2022 Mar 28;38(7):1816-1822. doi: 10.1093/bioinformatics/btac058.
3
phasebook: haplotype-aware de novo assembly of diploid genomes from long reads.
相位图:基于长读长的二倍体基因组单体型感知从头组装
Genome Biol. 2021 Oct 27;22(1):299. doi: 10.1186/s13059-021-02512-x.
4
Computational methods for chromosome-scale haplotype reconstruction.染色体级别的单倍型重构的计算方法。
Genome Biol. 2021 Apr 12;22(1):101. doi: 10.1186/s13059-021-02328-9.
5
LDICDL: LncRNA-Disease Association Identification Based on Collaborative Deep Learning.LDICDL:基于协同深度学习的 lncRNA-疾病关联识别。
IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1715-1723. doi: 10.1109/TCBB.2020.3034910. Epub 2022 Jun 3.
6
Haplotype threading: accurate polyploid phasing from long reads.单体型连接:长读长准确进行多倍体相位分析。
Genome Biol. 2020 Sep 21;21(1):252. doi: 10.1186/s13059-020-02158-1.
7
ComHapDet: a spatial community detection algorithm for haplotype assembly.ComHapDet:一种用于单倍型组装的空间群落检测算法。
BMC Genomics. 2020 Sep 9;21(Suppl 9):586. doi: 10.1186/s12864-020-06935-x.
8
Hap10: reconstructing accurate and long polyploid haplotypes using linked reads.Hap10:利用连锁reads 重建准确和长的多倍体单倍型。
BMC Bioinformatics. 2020 Jun 18;21(1):253. doi: 10.1186/s12859-020-03584-5.
9
Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads.利用长读长有效地解决重复序列来组装染色体级别的 contigs。
Nat Commun. 2019 Nov 25;10(1):5360. doi: 10.1038/s41467-019-13355-3.
10
Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.利用单分子、高保真长读长提高单倍体人类基因组的组装和变异检测。
Ann Hum Genet. 2020 Mar;84(2):125-140. doi: 10.1111/ahg.12364. Epub 2019 Nov 11.