机器学习预测真核病原体中减数分裂驱动的结构变异的基因组决定因素。

Machine-learning predicts genomic determinants of meiosis-driven structural variation in a eukaryotic pathogen.

作者信息

Badet Thomas, Fouché Simone, Hartmann Fanny E, Zala Marcello, Croll Daniel

机构信息

Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland.

Plant Pathology, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland.

出版信息

Nat Commun. 2021 Jun 10;12(1):3551. doi: 10.1038/s41467-021-23862-x.

DOI:10.1038/s41467-021-23862-x

PMID:34112792

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8192914/

Abstract

Species harbor extensive structural variation underpinning recent adaptive evolution. However, the causality between genomic features and the induction of new rearrangements is poorly established. Here, we analyze a global set of telomere-to-telomere genome assemblies of a fungal pathogen of wheat to establish a nucleotide-level map of structural variation. We show that the recent emergence of pesticide resistance has been disproportionally driven by rearrangements. We use machine learning to train a model on structural variation events based on 30 chromosomal sequence features. We show that base composition and gene density are the major determinants of structural variation. Retrotransposons explain most inversion, indel and duplication events. We apply our model to Arabidopsis thaliana and show that our approach extends to more complex genomes. Finally, we analyze complete genomes of haploid offspring in a four-generation pedigree. Meiotic crossover locations are enriched for new rearrangements consistent with crossovers being mutational hotspots. The model trained on species-wide structural variation accurately predicts the position of >74% of newly generated variants along the pedigree. The predictive power highlights causality between specific sequence features and the induction of chromosomal rearrangements. Our work demonstrates that training sequence-derived models can accurately identify regions of intrinsic DNA instability in eukaryotic genomes.

摘要

物种具有广泛的结构变异，这些变异是近期适应性进化的基础。然而，基因组特征与新重排诱导之间的因果关系尚未明确确立。在这里，我们分析了一组全球范围内的小麦真菌病原体的端粒到端粒基因组组装，以建立结构变异的核苷酸水平图谱。我们表明，近期出现的抗药性不成比例地受到重排的驱动。我们使用机器学习基于30个染色体序列特征对结构变异事件训练一个模型。我们表明碱基组成和基因密度是结构变异的主要决定因素。逆转座子解释了大多数倒位、插入缺失和重复事件。我们将我们的模型应用于拟南芥，并表明我们的方法可扩展到更复杂的基因组。最后，我们分析了一个四代谱系中单倍体后代的完整基因组。减数分裂交叉位置富含新的重排，这与交叉是突变热点一致。在全物种结构变异上训练的模型准确预测了沿谱系>74%的新产生变异的位置。预测能力突出了特定序列特征与染色体重排诱导之间的因果关系。我们的工作表明，训练基于序列的模型可以准确识别真核生物基因组中内在DNA不稳定的区域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/77fa/8192914/663d27148b07/41467_2021_23862_Fig1_HTML.jpg

相似文献

Machine-learning predicts genomic determinants of meiosis-driven structural variation in a eukaryotic pathogen.机器学习预测真核病原体中减数分裂驱动的结构变异的基因组决定因素。

Nat Commun. 2021 Jun 10;12(1):3551. doi: 10.1038/s41467-021-23862-x.

The Evolution of Orphan Regions in Genomes of a Fungal Pathogen of Wheat.小麦真菌病原体基因组中孤儿区域的进化

mBio. 2016 Oct 18;7(5):e01231-16. doi: 10.1128/mBio.01231-16.

Breakage-fusion-bridge cycles and large insertions contribute to the rapid evolution of accessory chromosomes in a fungal pathogen.断裂-融合-桥循环和大片段插入导致真菌病原体附属染色体的快速进化。

PLoS Genet. 2013 Jun;9(6):e1003567. doi: 10.1371/journal.pgen.1003567. Epub 2013 Jun 13.

Fine-Scale Crossover Rate Variation on the Caenorhabditis elegans X Chromosome.秀丽隐杆线虫X染色体上的精细尺度交叉率变异

G3 (Bethesda). 2016 Jun 1;6(6):1767-76. doi: 10.1534/g3.116.028001.

Analysis of Arabidopsis genome-wide variations before and after meiosis and meiotic recombination by resequencing Landsberg erecta and all four products of a single meiosis.通过重测序拟南芥 Landsberg erecta 及其单减数分裂的全部四个产物，分析减数分裂前后和减数分裂重组的拟南芥全基因组变异。

Genome Res. 2012 Mar;22(3):508-18. doi: 10.1101/gr.127522.111. Epub 2011 Nov 21.

Genomic rearrangements generate hypervariable mini-chromosomes in host-specific isolates of the blast fungus.基因组重排产生了在芽枝霉宿主特异性分离株中高度变异的微染色体。

PLoS Genet. 2021 Feb 16;17(2):e1009386. doi: 10.1371/journal.pgen.1009386. eCollection 2021 Feb.

A novel mode of chromosomal evolution peculiar to filamentous Ascomycete fungi.丝状子囊菌所特有的一种新型染色体进化模式。

Genome Biol. 2011;12(5):R45. doi: 10.1186/gb-2011-12-5-r45. Epub 2011 May 24.

Meta-analysis of chromosome-scale crossover rate variation in eukaryotes and its significance to evolutionary genomics.真核生物染色体尺度交叉率变异的荟萃分析及其对进化基因组学的意义。

Mol Ecol. 2018 Jun;27(11):2477-2497. doi: 10.1111/mec.14699. Epub 2018 May 23.

Recombination suppression in heterozygotes for a pericentric inversion induces the interchromosomal effect on crossovers in Arabidopsis.着丝粒周围倒位杂合体中的重组抑制诱导拟南芥的染色体间交叉效应。

Plant J. 2019 Dec;100(6):1163-1175. doi: 10.1111/tpj.14505. Epub 2019 Oct 7.

Predictive Models of Recombination Rate Variation across the Drosophila melanogaster Genome.黑腹果蝇基因组中重组率变异的预测模型

Genome Biol Evol. 2016 Sep 2;8(8):2597-612. doi: 10.1093/gbe/evw181.

引用本文的文献

Predictive model of double-J pipe scab size after radical total cystectomy based on Boruta feature selection and LASSO technique: A retrospective cross-sectional study.基于Boruta特征选择和LASSO技术的根治性全膀胱切除术后双J管结痂大小预测模型：一项回顾性横断面研究

Sci Prog. 2025 Jul-Sep;108(3):368504251363901. doi: 10.1177/00368504251363901. Epub 2025 Jul 29.

Patterns and mechanisms of fungal genome plasticity.真菌基因组可塑性的模式与机制。

Curr Biol. 2025 Jun 9;35(11):R527-R544. doi: 10.1016/j.cub.2025.04.003.

Recombination and transposition drive genomic structural variation potentially impacting life history traits in a host-generalist fungal plant pathogen.重组和转座驱动基因组结构变异，这可能会影响一种寄主广谱性真菌植物病原体的生活史特征。

BMC Biol. 2025 Apr 28;23(1):110. doi: 10.1186/s12915-025-02179-x.

Development of an early prediction model for vomiting during hemodialysis using LASSO regression and Boruta feature selection.使用LASSO回归和Boruta特征选择开发血液透析期间呕吐的早期预测模型。

Sci Rep. 2025 Mar 26;15(1):10434. doi: 10.1038/s41598-025-95287-1.

Copy number variation introduced by a massive mobile element facilitates global thermal adaptation in a fungal wheat pathogen.大规模移动元件引起的拷贝数变异促进了真菌小麦病原体的全球热适应。

Nat Commun. 2024 Jul 8;15(1):5728. doi: 10.1038/s41467-024-49913-7.

The landscape and predicted roles of structural variants in Fusarium graminearum genomes.镰刀菌禾谷种基因组结构变异的景观和预测作用。

G3 (Bethesda). 2024 Jun 5;14(6). doi: 10.1093/g3journal/jkae065.

Distinct genomic contexts predict gene presence-absence variation in different pathotypes of Magnaporthe oryzae.不同稻瘟病菌生理小种中基因存在/缺失变异的独特基因组环境预测。

Genetics. 2024 Apr 3;226(4). doi: 10.1093/genetics/iyae012.

Recent reactivation of a pathogenicity-associated transposable element is associated with major chromosomal rearrangements in a fungal wheat pathogen.近期一个与致病性相关的转座元件的重新激活与一种真菌小麦病原体的主要染色体重排有关。

Nucleic Acids Res. 2024 Feb 9;52(3):1226-1242. doi: 10.1093/nar/gkad1214.

Combined reference-free and multi-reference based GWAS uncover cryptic variation underlying rapid adaptation in a fungal plant pathogen.联合无参考和多参考的 GWAS 揭示了真菌植物病原体快速适应的潜在隐性变异。

PLoS Pathog. 2023 Nov 16;19(11):e1011801. doi: 10.1371/journal.ppat.1011801. eCollection 2023 Nov.

Development and assessment of novel machine learning models to predict the probability of postoperative nausea and vomiting for patient-controlled analgesia.开发和评估新型机器学习模型以预测患者自控镇痛术后恶心呕吐的概率。

Sci Rep. 2023 Apr 20;13(1):6439. doi: 10.1038/s41598-023-33807-7.

本文引用的文献

Population-level deep sequencing reveals the interplay of clonal and sexual reproduction in the fungal wheat pathogen .群体水平深度测序揭示了真菌小麦病原体中克隆和有性生殖的相互作用。

Microb Genom. 2021 Oct;7(10). doi: 10.1099/mgen.0.000678.

PLoS Genet. 2021 Feb 16;17(2):e1009386. doi: 10.1371/journal.pgen.1009386. eCollection 2021 Feb.

Chromosome-level assemblies of multiple Arabidopsis genomes reveal hotspots of rearrangements with altered evolutionary dynamics.多份拟南芥基因组的染色体水平组装揭示了具有改变进化动态的重排热点。

Nat Commun. 2020 Feb 20;11(1):989. doi: 10.1038/s41467-020-14779-y.

A 19-isolate reference-quality global pangenome for the fungal wheat pathogen Zymoseptoria tritici.一个真菌小麦病原体小麦叶锈菌的 19 个分离株参考质量的泛基因组。

BMC Biol. 2020 Feb 11;18(1):12. doi: 10.1186/s12915-020-0744-3.

SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies.SyRI：从全基因组组装中发现基因组重排和局部序列差异。

Genome Biol. 2019 Dec 16;20(1):277. doi: 10.1186/s13059-019-1911-0.

OrthoFinder: phylogenetic orthology inference for comparative genomics.OrthoFinder：用于比较基因组学的系统发育直系同源推断。

Genome Biol. 2019 Nov 14;20(1):238. doi: 10.1186/s13059-019-1832-y.

An Ultra High-Density Crossover Map That Refines the Influences of Structural Variation and Epigenetic Features.一种超高密度的交叉映射，可细化结构变异和表观遗传特征的影响。

Genetics. 2019 Nov;213(3):771-787. doi: 10.1534/genetics.119.302406. Epub 2019 Sep 16.

A Species-Wide Inventory of NLR Genes and Alleles in Arabidopsis thaliana.拟南芥 NLR 基因和等位基因的全物种清单。

Cell. 2019 Aug 22;178(5):1260-1272.e14. doi: 10.1016/j.cell.2019.07.038.

Convergent evolution in the genomics era: new insights and directions.基因组学时代的趋同进化：新见解与新方向。

Philos Trans R Soc Lond B Biol Sci. 2019 Jul 22;374(1777):20190102. doi: 10.1098/rstb.2019.0102. Epub 2019 Jun 3.

Destabilization of chromosome structure by histone H3 lysine 27 methylation.组蛋白 H3 赖氨酸 27 甲基化导致染色体结构的不稳定性。

PLoS Genet. 2019 Apr 22;15(4):e1008093. doi: 10.1371/journal.pgen.1008093. eCollection 2019 Apr.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

机器学习预测真核病原体中减数分裂驱动的结构变异的基因组决定因素。

Machine-learning predicts genomic determinants of meiosis-driven structural variation in a eukaryotic pathogen.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献