College of Computer and Information Engineering, Inner Mongolia Agricultural University, Erdos East Street No. 29, Hohhot, 010011, China.
Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Zhaowuda Road No. 306, Hohhot, 010018, China.
BMC Bioinformatics. 2023 Oct 11;24(1):384. doi: 10.1186/s12859-023-05514-7.
With the significant reduction in the cost of high-throughput sequencing technology, genomic selection technology has been rapidly developed in the field of plant breeding. Although numerous genomic selection methods have been proposed by researchers, the existing genomic selection methods still face the problem of poor prediction accuracy in practical applications.
This paper proposes a genome prediction method MSXFGP based on a multi-strategy improved sparrow search algorithm (SSA) to optimize XGBoost parameters and feature selection. Firstly, logistic chaos mapping, elite learning, adaptive parameter adjustment, Levy flight, and an early stop strategy are incorporated into the SSA. This integration serves to enhance the global and local search capabilities of the algorithm, thereby improving its convergence accuracy and stability. Subsequently, the improved SSA is utilized to concurrently optimize XGBoost parameters and feature selection, leading to the establishment of a new genomic selection method, MSXFGP. Utilizing both the coefficient of determination R and the Pearson correlation coefficient as evaluation metrics, MSXFGP was evaluated against six existing genomic selection models across six datasets. The findings reveal that MSXFGP prediction accuracy is comparable or better than existing widely used genomic selection methods, and it exhibits better accuracy when R is utilized as an assessment metric. Additionally, this research provides a user-friendly Python utility designed to aid breeders in the effective application of this innovative method. MSXFGP is accessible at https://github.com/DIBreeding/MSXFGP .
The experimental results show that the prediction accuracy of MSXFGP is comparable or better than existing genome selection methods, providing a new approach for plant genome selection.
随着高通量测序技术成本的显著降低,基因组选择技术在植物育种领域得到了迅速发展。尽管研究人员已经提出了许多基因组选择方法,但现有的基因组选择方法在实际应用中仍然面临预测精度差的问题。
本文提出了一种基于多策略改进麻雀搜索算法(SSA)的基因组预测方法 MSXFGP,用于优化 XGBoost 参数和特征选择。首先,将逻辑混沌映射、精英学习、自适应参数调整、莱维飞行和提前停止策略纳入 SSA 中。这种集成旨在增强算法的全局和局部搜索能力,从而提高其收敛精度和稳定性。随后,利用改进的 SSA 同时优化 XGBoost 参数和特征选择,建立了一种新的基因组选择方法 MSXFGP。利用决定系数 R 和 Pearson 相关系数作为评价指标,在六个数据集上对 MSXFGP 与六个现有的基因组选择模型进行了评估。结果表明,MSXFGP 的预测精度与现有的广泛使用的基因组选择方法相当或更好,并且当使用 R 作为评估指标时,它表现出更好的准确性。此外,本研究提供了一个用户友好的 Python 实用程序,旨在帮助育种者有效应用这种创新方法。MSXFGP 可在 https://github.com/DIBreeding/MSXFGP 获得。
实验结果表明,MSXFGP 的预测精度与现有的基因组选择方法相当或更好,为植物基因组选择提供了一种新方法。