Özen Doğukan, Özen Hülya, Gül Elif Bersu, Olgaç Kemal Tuna, Tekin Koray, Tirpan Mehmet Borga, Akçay Ergun, Daşkin Ali
Faculty of Veterinary Medicine, Department of Biostatistics, Ankara University, Ankara, Turkiye.
Gulhane Faculty of Medicine, Department of Medical Informatics, University of Health Sciences, Ankara, Turkiye.
Vet Med Sci. 2025 Sep;11(5):e70539. doi: 10.1002/vms3.70539.
Reproductive efficiency is a crucial determinant of livestock productivity, with sperm quality being a key factor in successful fertilization. The quantitative assessment of spermatozoa using computer-assisted sperm analysis (CASA) yields valuable kinetic variables that can vary across cattle breeds. This study aimed (i) to classify post-thawed semen samples from Holstein, Simmental and Charolais bulls based on eight CASA-derived variables, progressive motility (PM), non-PM, velocity curve linear (VCL), velocity straight line (VSL), beat-cross frequency (BCF), amplitude of lateral head displacement (ALH), hyperactivity and velocity average path (VAP); (ii) to benchmark three tree-based classifiers, C5.0, random forest (RF) and stochastic gradient boosting (SGB), for their ability to assign ejaculates to the correct breed; and (iii) to identify the most informative predictors for breed discrimination within the algorithms. We applied and compared the predictive performance of three tree-based classification algorithms: C5.0, RF and SGB after the original dataset was randomly divided into the training and testing sets with 70%-30%, 75%-25% and 80%-20% ratios, respectively. Parameter tuning was carried out with the application of a 10-fold cross-validation technique with ten times repetition. The results showed that SGB achieved the highest performance for classification, with a mean balanced accuracy of 85.7% (86.4% for Holstein, 84.3% for Simmental and 86.5% for Charolais), followed by RF (83.5%) and C5.0 (73.5%). PM, hyperactivity and VSL were the most informative predictors. The results offer insights into breed-specific sperm characteristics, with potential implications for the development of breed-specific calibrations for CASA and ensure more efficient resource allocation in livestock production.
繁殖效率是家畜生产力的关键决定因素,精子质量是成功受精的关键因素。使用计算机辅助精子分析(CASA)对精子进行定量评估可产生有价值的动力学变量,这些变量在不同牛品种之间可能会有所不同。本研究旨在:(i)根据八个源自CASA的变量,即渐进性运动(PM)、非PM、速度曲线线性(VCL)、直线速度(VSL)、鞭打交叉频率(BCF)、头部侧向位移幅度(ALH)、多动性和平均路径速度(VAP),对荷斯坦、西门塔尔和夏洛来公牛解冻后的精液样本进行分类;(ii)评估三种基于树的分类器,即C5.0、随机森林(RF)和随机梯度提升(SGB),将射精样本分配到正确品种的能力;(iii)在算法中识别用于品种区分的最具信息性的预测因子。在将原始数据集分别以70%-30%、75%-25%和80%-20%的比例随机分为训练集和测试集后,我们应用并比较了三种基于树的分类算法:C5.0、RF和SGB的预测性能。通过应用十次重复的十折交叉验证技术进行参数调整。结果表明,SGB在分类方面表现最佳,平均平衡准确率为85.7%(荷斯坦为86.4%,西门塔尔为84.3%,夏洛来为86.5%),其次是RF(83.5%)和C5.0(73.5%)。PM、多动性和VSL是最具信息性的预测因子。这些结果为特定品种的精子特征提供了见解,对开发CASA的特定品种校准具有潜在意义,并确保家畜生产中更有效的资源分配。