用于连续遗传分配林木地理起源的机器学习技术

Machine learning techniques for continuous genetic assignment of geographic origin of forest trees.

作者信息

Degen Bernd, Yanbaev Yulai, Müller Niels A

机构信息

Thünen Institute of Forest Genetics, Grosshansdorf, Germany.

Bashkir State Agrarian University, Ufa, Russia.

出版信息

PLoS One. 2025 Jun 6;20(6):e0324994. doi: 10.1371/journal.pone.0324994. eCollection 2025.

DOI:10.1371/journal.pone.0324994

PMID:40478860

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12143523/

Abstract

Origin tracking is important to ensure use of the right seed source and trade with legally harvested timber. Additionally, it can help to reconstruct human-caused historical long-distance seed transfer and to spot mislabelling in forest field trials. So far, genetic assignment approaches were mostly discrete, assigning test samples to predefined groups. The main limitation of this approach is the justification of these discrete groups when genetic variation across the landscape is actually continuous. Here, we compare the accuracy of five continuous assignment methods. Specifically, we test a nearest neighbour method (NN), direct gaussian process regression (GPR-D) using the radial basis kernel function, grid based gaussian process regression (GPR-G) applying the Matérn kernel function, genomic prediction (GP) and deep learning (DL), using two genome-wide single nucleotide polymorphism (SNP) datasets of trees from across Europe. The first dataset comprises 30,000 SNPs from 865 European beech (Fagus sylvatica) trees, the second dataset consists of 381 SNPs from 1,883 pedunculate oak (Quercus robur) trees. The accuracy, as measured by the geographic distance between true and predicted locations, was highest for the GPR-G and DL methods with the beech dataset with a median distance of only 55 km and 76 km, respectively. For the oak data GPR-G and DL also performed best with median distances of 263 km and 278 km, respectively. The relative error (distance/max distance among tree pairs) was below 8% for 90% of all samples for the best method for both datasets. We detected 35 individuals and 10 groups as outliers in the beech data and 27 individuals and 18 groups in the oak data. These outliers may be caused by mislabelling or historical human-caused long distance seed transfer. We discuss the differences in performance of the approaches and highlight future applications and potential for further improvements.

摘要

溯源对于确保使用正确的种子来源以及合法采伐木材的贸易至关重要。此外，它有助于重建人为造成的历史远距离种子转移，并发现森林田间试验中的标签错误。到目前为止，遗传分配方法大多是离散的，将测试样本分配到预定义的组中。这种方法的主要局限性在于，当整个景观中的遗传变异实际上是连续的时候，这些离散组的合理性。在这里，我们比较了五种连续分配方法的准确性。具体来说，我们使用来自欧洲各地树木的两个全基因组单核苷酸多态性（SNP）数据集，测试了最近邻方法（NN）、使用径向基核函数的直接高斯过程回归（GPR-D）、应用Matérn核函数的基于网格的高斯过程回归（GPR-G）、基因组预测（GP）和深度学习（DL）。第一个数据集包含来自865棵欧洲山毛榉（Fagus sylvatica）树的30000个SNP，第二个数据集由来自1883棵英国栎（Quercus robur）树的381个SNP组成。对于山毛榉数据集，以真实位置和预测位置之间的地理距离衡量，GPR-G和DL方法的准确性最高，中位数距离分别仅为55公里和76公里。对于栎树数据，GPR-G和DL也表现最佳，中位数距离分别为263公里和278公里。对于两个数据集的最佳方法，90%的样本的相对误差（距离/树对之间的最大距离）低于8%。我们在山毛榉数据中检测到35个个体和10个组为异常值，在栎树数据中检测到27个个体和18个组为异常值。这些异常值可能是由标签错误或历史上人为造成的远距离种子转移引起的。我们讨论了这些方法在性能上的差异，并强调了未来的应用和进一步改进的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efcb/12143523/d13c9ffe7c13/pone.0324994.g001.jpg

相似文献

Machine learning techniques for continuous genetic assignment of geographic origin of forest trees.

PLoS One. 2025 Jun 6;20(6):e0324994. doi: 10.1371/journal.pone.0324994. eCollection 2025.

Genomic basis for drought resistance in European beech forests threatened by climate change.

Elife. 2021 Jun 16;10:e65532. doi: 10.7554/eLife.65532.

A nearest neighbour approach by genetic distance to the assignment of individual trees to geographic origin.

Forensic Sci Int Genet. 2017 Mar;27:132-141. doi: 10.1016/j.fsigen.2016.12.011. Epub 2016 Dec 29.

Differential radial growth patterns between beech (Fagus sylvatica L.) and oak (Quercus robur L.) on periodically waterlogged soils.

Tree Physiol. 2013 Apr;33(4):425-37. doi: 10.1093/treephys/tpt020. Epub 2013 Apr 5.

Genomic signatures of natural selection at phenology-related genes in a widely distributed tree species Fagus sylvatica L.

BMC Genomics. 2021 Jul 31;22(1):583. doi: 10.1186/s12864-021-07907-5.

Tree Physiol. 2010 Feb;30(2):177-92. doi: 10.1093/treephys/tpp105. Epub 2009 Dec 16.

The GenTree Dendroecological Collection, tree-ring and wood density data from seven tree species across Europe.

Sci Data. 2020 Jan 2;7(1):1. doi: 10.1038/s41597-019-0340-y.

Drought-adaptation potential in Fagus sylvatica: linking moisture availability with genetic diversity and dendrochronology.

PLoS One. 2012;7(3):e33636. doi: 10.1371/journal.pone.0033636. Epub 2012 Mar 20.

Fine-scale topographic influence on the spatial distribution of tree species diameter in old-growth beech (Fagus orientalis Lipsky.) forests, northern Iran.

Sci Rep. 2022 May 10;12(1):7633. doi: 10.1038/s41598-022-10606-0.

Environmental drivers interactively affect individual tree growth across temperate European forests.

Glob Chang Biol. 2019 Jan;25(1):201-217. doi: 10.1111/gcb.14493. Epub 2018 Nov 22.

本文引用的文献

Resilience of genetic diversity in forest trees over the Quaternary.

Nat Commun. 2024 Oct 14;15(1):8538. doi: 10.1038/s41467-024-52612-y.

Genomic variation of European beech reveals signals of local adaptation despite high levels of phenotypic plasticity.

Nat Commun. 2024 Oct 3;15(1):8553. doi: 10.1038/s41467-024-52933-y.

Rolling down that mountain: microgeographical adaptive divergence during a fast population expansion along a steep environmental gradient in European beech.

Heredity (Edinb). 2024 Aug;133(2):99-112. doi: 10.1038/s41437-024-00696-z. Epub 2024 Jun 18.

A framework for tracing timber following the Ukraine invasion.

Nat Plants. 2024 Mar;10(3):390-401. doi: 10.1038/s41477-024-01648-5. Epub 2024 Mar 11.

Genomic prediction using machine learning: a comparison of the performance of regularized regression, ensemble, instance-based and deep learning methods on synthetic and empirical data.

BMC Genomics. 2024 Feb 7;25(1):152. doi: 10.1186/s12864-023-09933-x.

A simulation study comparing advanced marker-assisted selection with genomic selection in tree breeding programs.

G3 (Bethesda). 2023 Sep 30;13(10). doi: 10.1093/g3journal/jkad164.

Detecting SNP markers discriminating horse breeds by deep learning.

Sci Rep. 2023 Jul 18;13(1):11592. doi: 10.1038/s41598-023-38601-z.

Forest tree species adaptation to climate across biomes: Building on the legacy of ecological genetics to anticipate responses to climate change.

Glob Chang Biol. 2023 Sep;29(17):4711-4730. doi: 10.1111/gcb.16711. Epub 2023 Apr 21.

Estimating human mobility in Holocene Western Eurasia with large-scale ancient genomic data.

Proc Natl Acad Sci U S A. 2023 Feb 28;120(9):e2218375120. doi: 10.1073/pnas.2218375120. Epub 2023 Feb 23.

A review of deep learning applications in human genomics using next-generation sequencing data.

Hum Genomics. 2022 Jul 25;16(1):26. doi: 10.1186/s40246-022-00396-x.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于连续遗传分配林木地理起源的机器学习技术

Machine learning techniques for continuous genetic assignment of geographic origin of forest trees.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献