• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于图聚类算法的长读段结构变异检测和分型方法。

A graph clustering algorithm for detection and genotyping of structural variants from long reads.

机构信息

Systems and Computing Engineering Department, Universidad de Los Andes, Bogotá 111711, Colombia.

出版信息

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giad112.

DOI:10.1093/gigascience/giad112
PMID:38206589
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10783151/
Abstract

BACKGROUND

Structural variants (SVs) are genomic polymorphisms defined by their length (>50 bp). The usual types of SVs are deletions, insertions, translocations, inversions, and copy number variants. SV detection and genotyping is fundamental given the role of SVs in phenomena such as phenotypic variation and evolutionary events. Thus, methods to identify SVs using long-read sequencing data have been recently developed.

FINDINGS

We present an accurate and efficient algorithm to predict germline SVs from long-read sequencing data. The algorithm starts collecting evidence (signatures) of SVs from read alignments. Then, signatures are clustered based on a Euclidean graph with coordinates calculated from lengths and genomic positions. Clustering is performed by the DBSCAN algorithm, which provides the advantage of delimiting clusters with high resolution. Clusters are transformed into SVs and a Bayesian model allows to precisely genotype SVs based on their supporting evidence. This algorithm is integrated into the single sample variants detector of the Next Generation Sequencing Experience Platform, which facilitates the integration with other functionalities for genomics analysis. We performed multiple benchmark experiments, including simulation and real data, representing different genome profiles, sequencing technologies (PacBio HiFi, ONT), and read depths.

CONCLUSION

The results show that our approach outperformed state-of-the-art tools on germline SV calling and genotyping, especially at low depths, and in error-prone repetitive regions. We believe this work significantly contributes to the development of bioinformatic strategies to maximize the use of long-read sequencing technologies.

摘要

背景

结构变异(SVs)是指长度大于 50bp 的基因组多态性。SV 的常见类型包括缺失、插入、易位、倒位和拷贝数变异。鉴于 SV 在表型变异和进化事件等现象中的作用,SV 的检测和基因分型至关重要。因此,最近已经开发了使用长读测序数据识别 SV 的方法。

发现

我们提出了一种从长读测序数据中预测种系 SV 的准确有效的算法。该算法从读取比对开始收集 SV 的证据(特征)。然后,根据从长度和基因组位置计算的坐标的欧几里得图对特征进行聚类。聚类通过 DBSCAN 算法执行,该算法提供了以高分辨率限定聚类的优势。聚类转换为 SV,贝叶斯模型允许根据其支持证据精确地对 SV 进行基因分型。该算法集成到下一代测序体验平台的单个样本变体检测器中,便于与其他基因组分析功能集成。我们进行了多次基准实验,包括模拟和真实数据,代表不同的基因组特征、测序技术(PacBio HiFi、ONT)和读取深度。

结论

结果表明,我们的方法在种系 SV 调用和基因分型方面优于最先进的工具,尤其是在深度较低且易出错的重复区域。我们相信这项工作为开发最大限度利用长读测序技术的生物信息学策略做出了重大贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/2708e30222fc/giad112fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/018c867f830e/giad112fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/db3efadf4a77/giad112fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/48f5ef1490c7/giad112fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/d591961a7204/giad112fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/cfd678e74ad8/giad112fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/2708e30222fc/giad112fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/018c867f830e/giad112fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/db3efadf4a77/giad112fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/48f5ef1490c7/giad112fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/d591961a7204/giad112fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/cfd678e74ad8/giad112fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f21a/10783151/2708e30222fc/giad112fig6.jpg

相似文献

1
A graph clustering algorithm for detection and genotyping of structural variants from long reads.一种基于图聚类算法的长读段结构变异检测和分型方法。
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giad112.
2
GASOLINE: detecting germline and somatic structural variants from long-reads data.GASOLINE:从长读数据中检测种系和体细胞结构变体。
Sci Rep. 2023 Nov 27;13(1):20817. doi: 10.1038/s41598-023-48285-0.
3
Comparison of structural variants detected by PacBio-CLR and ONT sequencing in pear.梨中 PacBio-CLR 和 ONT 测序检测到的结构变异比较。
BMC Genomics. 2022 Dec 14;23(1):830. doi: 10.1186/s12864-022-09074-7.
4
SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads.SVDSS:使用准确长读段中样本特异性字符串在难以测序的基因组区域发现结构变异。
Nat Methods. 2023 Apr;20(4):550-558. doi: 10.1038/s41592-022-01674-1. Epub 2022 Dec 22.
5
SVsearcher: A more accurate structural variation detection method in long read data.SVsearcher:一种用于长读长数据中更准确的结构变异检测方法。
Comput Biol Med. 2023 May;158:106843. doi: 10.1016/j.compbiomed.2023.106843. Epub 2023 Mar 31.
6
Comparison of multiple algorithms to reliably detect structural variants in pears.比较多种算法以可靠地检测梨中的结构变异。
BMC Genomics. 2020 Jan 20;21(1):61. doi: 10.1186/s12864-020-6455-x.
7
Combined use of Oxford Nanopore and Illumina sequencing yields insights into soybean structural variation biology.联合使用牛津纳米孔和 Illumina 测序技术揭示了大豆结构变异生物学的见解。
BMC Biol. 2022 Feb 23;20(1):53. doi: 10.1186/s12915-022-01255-w.
8
MetaSVs: A pipeline combining long and short reads for analysis and visualization of structural variants in metagenomes.MetaSVs:一种结合长读段和短读段用于宏基因组中结构变异分析与可视化的流程。
Imeta. 2023 Oct 12;2(4):e139. doi: 10.1002/imt2.139. eCollection 2023 Nov.
9
SVJedi: genotyping structural variations with long reads.使用长读长进行基因分型结构变异。
Bioinformatics. 2020 Nov 1;36(17):4568-4575. doi: 10.1093/bioinformatics/btaa527.
10
Intronic Breakpoint Signatures Enhance Detection and Characterization of Clinically Relevant Germline Structural Variants.内含子断点特征可增强临床相关种系结构变异的检测和特征描述。
J Mol Diagn. 2021 May;23(5):612-629. doi: 10.1016/j.jmoldx.2021.01.015. Epub 2021 Feb 20.

引用本文的文献

1
Comparisons of performances of structural variants detection algorithms in solitary or combination strategy.结构变异检测算法在单独或联合策略下的性能比较。
PLoS One. 2025 Feb 6;20(2):e0314982. doi: 10.1371/journal.pone.0314982. eCollection 2025.

本文引用的文献

1
Novel sequencing technologies and bioinformatic tools for deciphering the non-coding genome.用于解读非编码基因组的新型测序技术和生物信息学工具。
Med Genet. 2021 Aug 14;33(2):133-145. doi: 10.1515/medgen-2021-2072. eCollection 2021 Jun.
2
Jasmine and Iris: population-scale structural variant comparison and analysis.茉莉花和虹膜:人群规模结构变异比较与分析。
Nat Methods. 2023 Mar;20(3):408-417. doi: 10.1038/s41592-022-01753-3. Epub 2023 Jan 19.
3
Truvari: refined structural variant comparison preserves allelic diversity.特鲁瓦里:精细化结构变异比较保留等位基因多样性。
Genome Biol. 2022 Dec 27;23(1):271. doi: 10.1186/s13059-022-02840-6.
4
A comprehensive benchmarking of WGS-based deletion structural variant callers.基于 WGS 的缺失结构变异调用器的综合基准测试。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac221.
5
The complete sequence of a human genome.人类基因组的完整序列。
Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.
6
Dysgu: efficient structural variant calling using short or long reads.Dysgu:使用短读长读进行高效的结构变异调用。
Nucleic Acids Res. 2022 May 20;50(9):e53. doi: 10.1093/nar/gkac039.
7
Long-read sequencing settings for efficient structural variation detection based on comprehensive evaluation.基于综合评估的高效结构变异检测的长读测序设置。
BMC Bioinformatics. 2021 Nov 12;22(1):552. doi: 10.1186/s12859-021-04422-y.
8
The Role of Structural Variation in Adaptation and Evolution of Yeast and Other Fungi.结构变异在酵母和其他真菌的适应和进化中的作用。
Genes (Basel). 2021 May 8;12(5):699. doi: 10.3390/genes12050699.
9
Whole-genome sequencing with long reads reveals complex structure and origin of structural variation in human genetic variations and somatic mutations in cancer.全基因组测序与长读长揭示了人类遗传变异和癌症体细胞突变中结构变异的复杂结构和起源。
Genome Med. 2021 Apr 29;13(1):65. doi: 10.1186/s13073-021-00883-1.
10
Haplotype-resolved diverse human genomes and integrated analysis of structural variation.单体型解析的多样化人类基因组和结构变异的综合分析。
Science. 2021 Apr 2;372(6537). doi: 10.1126/science.abf7117. Epub 2021 Feb 25.