• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

拟杂合性揭示拟南芥中广泛的序列重复。

Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity.

机构信息

Gregor Mendel Institute, Austrian Academy of Sciences, Vienna Biocenter, Vienna, Austria.

Department of Plant Sciences, University of Cambridge, Cambridge, UK.

出版信息

Genome Biol. 2023 Mar 9;24(1):44. doi: 10.1186/s13059-023-02875-3.

DOI:10.1186/s13059-023-02875-3
PMID:36895055
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9999624/
Abstract

BACKGROUND

It is apparent that genomes harbor much structural variation that is largely undetected for technical reasons. Such variation can cause artifacts when short-read sequencing data are mapped to a reference genome. Spurious SNPs may result from mapping of reads to unrecognized duplicated regions. Calling SNP using the raw reads of the 1001 Arabidopsis Genomes Project we identified 3.3 million (44%) heterozygous SNPs. Given that Arabidopsis thaliana (A. thaliana) is highly selfing, and that extensively heterozygous individuals have been removed, we hypothesize that these SNPs reflected cryptic copy number variation.

RESULTS

The heterozygosity we observe consists of particular SNPs being heterozygous across individuals in a manner that strongly suggests it reflects shared segregating duplications rather than random tracts of residual heterozygosity due to occasional outcrossing. Focusing on such pseudo-heterozygosity in annotated genes, we use genome-wide association to map the position of the duplicates. We identify 2500 putatively duplicated genes and validate them using de novo genome assemblies from six lines. Specific examples included an annotated gene and nearby transposon that transpose together. We also demonstrate that cryptic structural variation produces highly inaccurate estimates of DNA methylation polymorphism.

CONCLUSIONS

Our study confirms that most heterozygous SNP calls in A. thaliana are artifacts and suggest that great caution is needed when analyzing SNP data from short-read sequencing. The finding that 10% of annotated genes exhibit copy-number variation, and the realization that neither gene- nor transposon-annotation necessarily tells us what is actually mobile in the genome suggests that future analyses based on independently assembled genomes will be very informative.

摘要

背景

显然,基因组中存在大量结构变异,这些变异在很大程度上由于技术原因而未被检测到。当短读测序数据映射到参考基因组时,这种变异可能会导致伪 SNP。读取映射到未识别的重复区域可能会导致伪 SNP 的产生。使用 1001 拟南芥基因组计划的原始读取进行 SNP 调用,我们鉴定出 330 万个(44%)杂合 SNP。由于拟南芥(A. thaliana)高度自交,并且已经去除了广泛的杂合个体,我们假设这些 SNP 反映了隐性拷贝数变异。

结果

我们观察到的杂合性由个体之间特定 SNP 杂合的方式组成,这强烈表明它反映了共享的分离重复,而不是由于偶尔的杂交而导致的随机剩余杂合性的轨迹。我们专注于注释基因中的这种假杂合性,使用全基因组关联来映射重复的位置。我们鉴定出 2500 个可能的重复基因,并使用来自六个品系的从头基因组组装来验证它们。具体例子包括一个注释基因和附近一起转座的转座子。我们还证明了隐性结构变异会产生高度不准确的 DNA 甲基化多态性估计值。

结论

我们的研究证实,拟南芥中大多数杂合 SNP 调用都是伪 SNP,并表明在分析短读测序的 SNP 数据时需要非常谨慎。发现 10%的注释基因表现出拷贝数变异,并且认识到基因和转座子注释不一定能告诉我们基因组中哪些是实际可移动的,这表明基于独立组装基因组的未来分析将非常有启发性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a13/9999624/d0e167b88843/13059_2023_2875_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a13/9999624/04276f55ef46/13059_2023_2875_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a13/9999624/d62b6964e942/13059_2023_2875_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a13/9999624/20c940fc9687/13059_2023_2875_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a13/9999624/2b1a2a99a02f/13059_2023_2875_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a13/9999624/d0e167b88843/13059_2023_2875_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a13/9999624/04276f55ef46/13059_2023_2875_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a13/9999624/d62b6964e942/13059_2023_2875_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a13/9999624/20c940fc9687/13059_2023_2875_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a13/9999624/2b1a2a99a02f/13059_2023_2875_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a13/9999624/d0e167b88843/13059_2023_2875_Fig5_HTML.jpg

相似文献

1
Extensive sequence duplication in Arabidopsis revealed by pseudo-heterozygosity.拟杂合性揭示拟南芥中广泛的序列重复。
Genome Biol. 2023 Mar 9;24(1):44. doi: 10.1186/s13059-023-02875-3.
2
A De Novo Genome Sequence Assembly of the Arabidopsis thaliana Accession Niederzenz-1 Displays Presence/Absence Variation and Strong Synteny.拟南芥 Niederzenz-1 生态型的从头基因组序列组装显示了存在/缺失变异和高度的共线性。
PLoS One. 2016 Oct 6;11(10):e0164321. doi: 10.1371/journal.pone.0164321. eCollection 2016.
3
Genome-wide SNP discovery in walnut with an AGSNP pipeline updated for SNP discovery in allogamous organisms.利用 AGSNP 管道进行全基因组 SNP 发现,该管道经过更新,可用于发现所有杂交生物中的 SNP。
BMC Genomics. 2012 Jul 31;13:354. doi: 10.1186/1471-2164-13-354.
4
Exploring structural variation and gene family architecture with De Novo assemblies of 15 Medicago genomes.利用15个苜蓿基因组的从头组装探索结构变异和基因家族结构。
BMC Genomics. 2017 Mar 27;18(1):261. doi: 10.1186/s12864-017-3654-1.
5
Benchmarking hybrid assemblies of Giardia and prediction of widespread intra-isolate structural variation.原核生物和预测广泛的种内结构变异的混合组装的基准测试。
Parasit Vectors. 2020 Feb 28;13(1):108. doi: 10.1186/s13071-020-3968-8.
6
Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence.基于注释的全基因组 SNP 发现利用下一代测序技术在没有参考基因组序列的情况下在大型复杂的粗山羊草基因组中
BMC Genomics. 2011 Jan 25;12:59. doi: 10.1186/1471-2164-12-59.
7
Estimating genomic diversity and population differentiation - an empirical comparison of microsatellite and SNP variation in Arabidopsis halleri.估计基因组多样性和种群分化——拟南芥微卫星和单核苷酸多态性变异的实证比较
BMC Genomics. 2017 Jan 11;18(1):69. doi: 10.1186/s12864-016-3459-7.
8
Chromosome-Scale Assembly and Annotation of Eight Arabidopsis thaliana Ecotypes.八个拟南芥生态型的染色体级别的组装和注释。
Genome Biol Evol. 2024 Aug 5;16(8). doi: 10.1093/gbe/evae169.
9
Sequencing of natural strains of Arabidopsis thaliana with short reads.对拟南芥自然菌株进行短读长测序。
Genome Res. 2008 Dec;18(12):2024-33. doi: 10.1101/gr.080200.108. Epub 2008 Sep 25.
10
A Highly Specific Genome-Wide Association Study Integrated with Transcriptome Data Reveals the Contribution of Copy Number Variations to Specialized Metabolites in Arabidopsis thaliana Accessions.一项高度特异性的全基因组关联研究与转录组数据相结合,揭示了拷贝数变异对拟南芥品系中特化代谢物的贡献。
Mol Biol Evol. 2017 Dec 1;34(12):3111-3122. doi: 10.1093/molbev/msx234.

引用本文的文献

1
A species-wide inventory of receptor-like kinases in Arabidopsis thaliana.拟南芥中类受体激酶的全物种清单。
BMC Biol. 2025 Aug 26;23(1):266. doi: 10.1186/s12915-025-02364-y.
2
A comparison of 27 Arabidopsis thaliana genomes and the path toward an unbiased characterization of genetic polymorphism.27个拟南芥基因组的比较以及遗传多态性无偏差表征的途径。
Nat Genet. 2025 Aug 19. doi: 10.1038/s41588-025-02293-0.
3
Plant graph-based pangenomics: techniques, applications, and challenges.基于植物图谱的泛基因组学:技术、应用与挑战。

本文引用的文献

1
Calling large indels in 1047 Arabidopsis with IndelEnsembler.使用 IndelEnsembler 对 1047 个拟南芥进行大片段缺失和插入 calling。
Nucleic Acids Res. 2021 Nov 8;49(19):10879-10894. doi: 10.1093/nar/gkab904.
2
Migration without interbreeding: Evolutionary history of a highly selfing Mediterranean grass inferred from whole genomes.无杂交的迁移:基于全基因组推断高度自交地中海草本植物的进化历史。
Mol Ecol. 2022 Jan;31(1):70-85. doi: 10.1111/mec.16207. Epub 2021 Oct 17.
3
Gradual evolution of allopolyploidy in Arabidopsis suecica.拟南芥瑞典亚种异源多倍体的逐渐进化。
aBIOTECH. 2025 Mar 28;6(2):361-376. doi: 10.1007/s42994-025-00206-7. eCollection 2025 Jun.
4
When numbers matter: Rethinking the role of gene duplication on short evolutionary timescales.当数字起作用时:重新思考基因复制在短进化时间尺度上的作用。
Am J Bot. 2025 Jul;112(7):e70072. doi: 10.1002/ajb2.70072. Epub 2025 Jul 9.
5
Genome-wide diversity and MHC characterisation in a critically endangered freshwater turtle susceptible to disease.一种易患疾病的极度濒危淡水龟的全基因组多样性和主要组织相容性复合体特征分析
Immunogenetics. 2025 May 6;77(1):21. doi: 10.1007/s00251-025-01378-8.
6
SNP-RFLP Markers for the Study of .用于……研究的单核苷酸多态性-限制性片段长度多态性标记
Ecol Evol. 2025 Apr 23;15(4):e71056. doi: 10.1002/ece3.71056. eCollection 2025 Apr.
7
Re-analysis of mobile mRNA datasets raises questions about the extent of long-distance mRNA communication.对移动mRNA数据集的重新分析引发了关于长距离mRNA通讯程度的问题。
Nat Plants. 2025 May;11(5):977-984. doi: 10.1038/s41477-025-01979-x. Epub 2025 Apr 16.
8
K-mer-based Approaches to Bridging Pangenomics and Population Genetics.基于K-mer的泛基因组学与群体遗传学关联方法。
Mol Biol Evol. 2025 Mar 5;42(3). doi: 10.1093/molbev/msaf047.
9
Impacts of reproductive systems on grapevine genome and breeding.生殖系统对葡萄基因组及育种的影响。
Nat Commun. 2025 Mar 3;16(1):2031. doi: 10.1038/s41467-025-56817-7.
10
Exploring the Relationship Between Gene Expression and Low-Frequency Somatic Mutations in Arabidopsis with Duplex Sequencing.利用双端测序技术探索拟南芥基因表达与低频体细胞突变之间的关系。
Genome Biol Evol. 2024 Oct 9;16(10). doi: 10.1093/gbe/evae213.
Nat Ecol Evol. 2021 Oct;5(10):1367-1381. doi: 10.1038/s41559-021-01525-w. Epub 2021 Aug 19.
4
De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes.从头组装、注释和 26 个不同玉米基因组的比较分析。
Science. 2021 Aug 6;373(6555):655-662. doi: 10.1126/science.abg5289.
5
A New Catalog of Structural Variants in 1,301 A. thaliana Lines from Africa, Eurasia, and North America Reveals a Signature of Balancing Selection at Defense Response Genes.一份来自非洲、欧亚大陆和北美的 1301 条拟南芥品系的结构变异新目录揭示了防御反应基因中平衡选择的特征。
Mol Biol Evol. 2021 Apr 13;38(4):1498-1511. doi: 10.1093/molbev/msaa309.
6
Pan-Genome of Wild and Cultivated Soybeans.野生和栽培大豆的泛基因组
Cell. 2020 Jul 9;182(1):162-176.e13. doi: 10.1016/j.cell.2020.05.023. Epub 2020 Jun 17.
7
Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato.广泛的结构变异对番茄基因表达和作物改良的主要影响。
Cell. 2020 Jul 9;182(1):145-161.e23. doi: 10.1016/j.cell.2020.05.021. Epub 2020 Jun 17.
8
A platinum standard pan-genome resource that represents the population structure of Asian rice.一个代表亚洲稻米群体结构的铂金标准泛基因组资源。
Sci Data. 2020 Apr 7;7(1):113. doi: 10.1038/s41597-020-0438-2.
9
AthCNV: A Map of DNA Copy Number Variations in the Arabidopsis Genome.AthCNV:拟南芥基因组中的 DNA 拷贝数变异图谱。
Plant Cell. 2020 Jun;32(6):1797-1819. doi: 10.1105/tpc.19.00640. Epub 2020 Apr 7.
10
Long-read sequencing reveals genomic structural variations that underlie creation of quality protein maize.长读测序揭示了导致优质蛋白玉米产生的基因组结构变异。
Nat Commun. 2020 Jan 7;11(1):17. doi: 10.1038/s41467-019-14023-2.