Majumder Sambadi, Mason Chase M
Department of Biology University of Central Florida Orlando 32816 Florida USA.
Department of Biology, Irving K. Barber Faculty of Science The University of British Columbia Okanagan Kelowna V1V 1V7 British Columbia Canada.
Appl Plant Sci. 2025 Jun 18;13(3):e70015. doi: 10.1002/aps3.70015. eCollection 2025 May-Jun.
Here we demonstrate the application of interpretable machine learning methods to investigate intraspecific functional trait divergence using diverse genotypes of the wide-ranging sunflower occupying populations across two contrasting ecoregions-the Great Plains versus the North American Deserts.
Recursive feature elimination was applied to functional trait data from the HeliantHOME database, followed by the application of the Boruta algorithm to detect the traits that are most predictive of ecoregion. Random forest and gradient boosting machine classifiers were then trained and validated, with results visualized using accumulated local effects plots.
The most ecoregion-predictive functional traits span categories of leaf economics, plant architecture, reproductive phenology, and floral and seed morphology. Relative to the Great Plains, genotypes from the North American Deserts exhibit shorter stature, fewer leaves, higher leaf nitrogen content, and longer average length of phyllaries.
This approach readily identifies traits predictive of ecoregion origin, and thus the functional traits most likely to be responsible for contrasting ecological strategies across the landscape. This type of approach can be used to parse large plant trait datasets in a wide range of contexts, including explicitly testing the applicability of interspecific paradigms at intraspecific scales.
在此,我们展示了可解释机器学习方法在研究种内功能性状差异方面的应用,该研究使用了广泛分布于两个形成对比的生态区域(大平原与北美沙漠)的向日葵不同基因型。
对来自HeliantHOME数据库的功能性状数据应用递归特征消除法,随后应用博鲁塔算法来检测最能预测生态区域的性状。然后训练并验证随机森林和梯度提升机分类器,使用累积局部效应图对结果进行可视化。
最能预测生态区域的功能性状涵盖叶经济学、植物结构、生殖物候以及花和种子形态等类别。相对于大平原,来自北美沙漠的基因型植株更矮、叶片更少、叶氮含量更高且总苞平均长度更长。
这种方法能够轻松识别出预测生态区域起源的性状,从而找出最有可能导致整个景观中生态策略形成对比的功能性状。这种方法可用于在广泛的背景下解析大型植物性状数据集,包括在种内尺度上明确测试种间范式的适用性。