• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GeneToCN:一种从下一代测序读取中直接进行基因拷贝数估计的无比对方法。

GeneToCN: an alignment-free method for gene copy number estimation directly from next-generation sequencing reads.

机构信息

Institute of Molecular and Cell Biology, University of Tartu, 23 Riia Str., 51010, Tartu, Estonia.

出版信息

Sci Rep. 2023 Oct 18;13(1):17765. doi: 10.1038/s41598-023-44636-z.

DOI:10.1038/s41598-023-44636-z
PMID:37853040
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10584998/
Abstract

Genomes exhibit large regions with segmental copy number variation, many of which include entire genes and are multiallelic. We have developed a computational method GeneToCN that counts the frequencies of gene-specific k-mers in FASTQ files and uses this information to infer copy number of the gene. We validated the copy number predictions for amylase genes (AMY1, AMY2A, AMY2B) using experimental data from digital droplet PCR (ddPCR) on 39 individuals and observed a strong correlation (R = 0.99) between GeneToCN predictions and experimentally determined copy numbers. An additional validation on FCGR3 genes showed a higher concordance for FCGR3A compared to two other methods, but reduced accuracy for FCGR3B. We further tested the method on three different genomic regions (SMN, NPY4R, and LPA Kringle IV-2 domain). Predicted copy number distributions of these genes in a set of 500 individuals from the Estonian Biobank were in good agreement with the previously published studies. In addition, we investigated the possibility to use GeneToCN on sequencing data generated by different technologies by comparing copy number predictions from Illumina, PacBio, and Oxford Nanopore data of the same sample. Despite the differences in variability of k-mer frequencies, all three sequencing technologies give similar predictions with GeneToCN.

摘要

基因组表现出具有片段拷贝数变异的大片段区域,其中许多区域包括整个基因且具有多等位基因。我们开发了一种计算方法 GeneToCN,它可以计算 FASTQ 文件中基因特异性 k-mer 的频率,并利用这些信息推断基因的拷贝数。我们使用来自 39 个人的数字液滴 PCR (ddPCR) 的实验数据验证了淀粉酶基因 (AMY1、AMY2A、AMY2B) 的拷贝数预测,发现 GeneToCN 预测和实验确定的拷贝数之间存在很强的相关性 (R = 0.99)。对 FCGR3 基因的进一步验证表明,与其他两种方法相比,FCGR3A 的一致性更高,但 FCGR3B 的准确性降低。我们还在三个不同的基因组区域 (SMN、NPY4R 和 LPA kringle IV-2 结构域) 上测试了该方法。在来自爱沙尼亚生物库的 500 个人的一组中,这些基因的预测拷贝数分布与之前发表的研究结果非常一致。此外,我们通过比较相同样本的 Illumina、PacBio 和 Oxford Nanopore 数据的拷贝数预测,研究了使用不同技术生成的测序数据GeneToCN 的可能性。尽管 k-mer 频率的可变性存在差异,但所有三种测序技术都可以使用 GeneToCN 给出相似的预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/31140a624c6f/41598_2023_44636_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/65054b60721d/41598_2023_44636_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/ffbaeac5009f/41598_2023_44636_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/443b76bee556/41598_2023_44636_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/85f7118a3f46/41598_2023_44636_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/fd7df550d5bf/41598_2023_44636_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/31140a624c6f/41598_2023_44636_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/65054b60721d/41598_2023_44636_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/ffbaeac5009f/41598_2023_44636_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/443b76bee556/41598_2023_44636_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/85f7118a3f46/41598_2023_44636_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/fd7df550d5bf/41598_2023_44636_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e475/10584998/31140a624c6f/41598_2023_44636_Fig6_HTML.jpg

相似文献

1
GeneToCN: an alignment-free method for gene copy number estimation directly from next-generation sequencing reads.GeneToCN:一种从下一代测序读取中直接进行基因拷贝数估计的无比对方法。
Sci Rep. 2023 Oct 18;13(1):17765. doi: 10.1038/s41598-023-44636-z.
2
Copy number determination of the gene for the human pancreatic polypeptide receptor NPY4R using read depth analysis and droplet digital PCR.利用读深分析和液滴数字 PCR 测定人胰腺多肽受体 NPY4R 基因的拷贝数。
BMC Biotechnol. 2019 Jun 4;19(1):31. doi: 10.1186/s12896-019-0523-9.
3
Differences in AMY1 Gene Copy Numbers Derived from Blood, Buccal Cells and Saliva Using Quantitative and Droplet Digital PCR Methods: Flagging the Pitfall.使用定量PCR和数字PCR方法检测血液、颊黏膜细胞和唾液中AMY1基因拷贝数的差异:警惕潜在问题
PLoS One. 2017 Jan 26;12(1):e0170767. doi: 10.1371/journal.pone.0170767. eCollection 2017.
4
Obesity, starch digestion and amylase: association between copy number variants at human salivary (AMY1) and pancreatic (AMY2) amylase genes.肥胖、淀粉消化与淀粉酶:人类唾液淀粉酶基因(AMY1)和胰腺淀粉酶基因(AMY2)拷贝数变异之间的关联
Hum Mol Genet. 2015 Jun 15;24(12):3472-80. doi: 10.1093/hmg/ddv098. Epub 2015 Mar 18.
5
AMYCNE: Confident copy number assessment using whole genome sequencing data.神经母细胞瘤:使用全基因组测序数据进行有信心的拷贝数评估。
PLoS One. 2018 Mar 26;13(3):e0189710. doi: 10.1371/journal.pone.0189710. eCollection 2018.
6
Association study of copy number variants in FCGR3A and FCGR3B gene with risk of ankylosing spondylitis in a Chinese population.中国人群中FCGR3A和FCGR3B基因拷贝数变异与强直性脊柱炎风险的关联研究。
Rheumatol Int. 2016 Mar;36(3):437-42. doi: 10.1007/s00296-015-3384-0. Epub 2015 Oct 22.
7
Application of droplet digital PCR to determine copy number of endogenous genes and transgenes in sugarcane.利用液滴数字 PCR 技术测定甘蔗内源基因和转基因拷贝数。
Plant Cell Rep. 2017 Nov;36(11):1775-1783. doi: 10.1007/s00299-017-2193-1. Epub 2017 Aug 28.
8
Analyzing Copy Number Variation with Droplet Digital PCR.使用微滴数字PCR分析拷贝数变异
Methods Mol Biol. 2018;1768:143-160. doi: 10.1007/978-1-4939-7778-9_9.
9
Complex Copy Number Variation of AMY1 does not Associate with Obesity in two East Asian Cohorts.淀粉酶1(AMY1)的复杂拷贝数变异与两个东亚队列中的肥胖症无关。
Hum Mutat. 2016 Jul;37(7):669-78. doi: 10.1002/humu.22996. Epub 2016 Apr 28.
10
Allele-Specific Droplet Digital PCR Combined with a Next-Generation Sequencing-Based Algorithm for Diagnostic Copy Number Analysis in Genes with High Homology: Proof of Concept Using Stereocilin.基于等位基因特异性液滴数字 PCR 联合下一代测序算法的高同源性基因拷贝数分析诊断:以 Stereocilin 为例的概念验证
Clin Chem. 2018 Apr;64(4):705-714. doi: 10.1373/clinchem.2017.280685. Epub 2018 Jan 16.

引用本文的文献

1
Y-mer: a k-mer based method for determining human Y chromosome haplogroups from ultra-low sequencing depth data.Y-mer:一种基于k-mer的方法,用于从超低测序深度数据中确定人类Y染色体单倍群。
Genome Biol. 2025 Aug 12;26(1):243. doi: 10.1186/s13059-025-03714-3.
2
Population-level gene copy number variations reveal distinct genetic properties of different Malus species.群体水平的基因拷贝数变异揭示了不同苹果属物种的独特遗传特性。
BMC Genomics. 2025 Jul 23;26(1):687. doi: 10.1186/s12864-025-11677-9.
3
Direct long-read visualization reveals hidden variation in GCH1 gene copy number and precise expansion steps.

本文引用的文献

1
The complete sequence of a human genome.人类基因组的完整序列。
Science. 2022 Apr;376(6588):44-53. doi: 10.1126/science.abj6987. Epub 2022 Mar 31.
2
DNA copy number variation: Main characteristics, evolutionary significance, and pathological aspects.DNA 拷贝数变异:主要特征、进化意义和病理方面。
Biomed J. 2021 Oct;44(5):548-559. doi: 10.1016/j.bj.2021.02.003. Epub 2021 Feb 13.
3
Genomic Variability in the Survival Motor Neuron Genes ( and ): Implications for Spinal Muscular Atrophy Phenotype and Therapeutics Development.
直接长读长可视化揭示了GCH1基因拷贝数的隐藏变异和精确的扩增步骤。
BMC Genomics. 2025 Jul 17;26(1):671. doi: 10.1186/s12864-025-11859-5.
4
Reconstruction of the human amylase locus reveals ancient duplications seeding modern-day variation.人类淀粉酶基因座的重建揭示了古代重复事件引发了现代的变异。
Science. 2024 Nov 22;386(6724):eadn0609. doi: 10.1126/science.adn0609.
生存运动神经元基因(和 )中的基因组变异性:对脊髓性肌萎缩症表型和治疗学发展的影响。
Int J Mol Sci. 2021 Jul 23;22(15):7896. doi: 10.3390/ijms22157896.
4
KATK: Fast genotyping of rare variants directly from unmapped sequencing reads.KATK:直接从未映射测序reads 中快速进行稀有变体的基因分型。
Hum Mutat. 2021 Jun;42(6):777-786. doi: 10.1002/humu.24197. Epub 2021 Apr 1.
5
A structural variation reference for medical and population genetics.医学和人群遗传学的结构变异参考
Nature. 2020 May;581(7809):444-451. doi: 10.1038/s41586-020-2287-8. Epub 2020 May 27.
6
Rapid, Paralog-Sensitive CNV Analysis of 2457 Human Genomes Using QuicK-mer2.利用 QuicK-mer2 快速、平行敏感的拷贝数变异分析 2457 个人类基因组
Genes (Basel). 2020 Jan 29;11(2):141. doi: 10.3390/genes11020141.
7
AluMine: alignment-free method for the discovery of polymorphic Alu element insertions.AluMine:用于发现多态性Alu元件插入的无比对方法。
Mob DNA. 2019 Jul 18;10:31. doi: 10.1186/s13100-019-0174-3. eCollection 2019.
8
Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing.全基因组测序结构变异检测算法的综合评估。
Genome Biol. 2019 Jun 3;20(1):117. doi: 10.1186/s13059-019-1720-5.
9
Deep coverage whole genome sequences and plasma lipoprotein(a) in individuals of European and African ancestries.深度覆盖全基因组序列和血浆脂蛋白(a)在欧洲和非洲血统个体。
Nat Commun. 2018 Jul 4;9(1):2606. doi: 10.1038/s41467-018-04668-w.
10
Copy number of pancreatic polypeptide receptor gene NPY4R correlates with body mass index and waist circumference.胰腺多肽受体基因 NPY4R 的拷贝数与体重指数和腰围相关。
PLoS One. 2018 Apr 5;13(4):e0194668. doi: 10.1371/journal.pone.0194668. eCollection 2018.