Kanaka K K, Ganguly Indrajit, Singh Sanjeev, Kuralkar S V, Dixit Satpal, Sukhija Nidhi, Goli Rangasai Chandra
ICAR-National Bureau of Animal Genetic Resources, Karnal, 132001, India.
ICAR-Indian Institute of Agricultural Biotechnology, Ranchi, 834003, India.
Biochem Genet. 2025 Aug 22. doi: 10.1007/s10528-025-11230-z.
Identifying and classifying different cattle populations as per their breed and utility holds immense practical importance in effective breeding management. For accurate identification and classification of cattle breeds, a reference panel of 10 breeds, 657 identified ancestry informative markers and different machine learning classifiers were employed. To boost the accuracy of breed identification, three distinct machine learning classification models: logistic regression, XGBoost, and random forest, each one having an accuracy of > 95%, were ensembled achieving an accuracy of > 98% with just 207 markers [breed informative markers (BIMs)]. Further, for classification of dairy and draft purpose cattle, the breed informative markers along with those in selection signatures specific to dairy and draft utility were explored, and 17 utility informative markers (UIMs) including 12 BIMs and 5 markers in selection signatures were identified based on an ensemble approach. The accuracy of classification of cattle based on the utility (dairy or draft) was > 96%. To demonstrate the application of UIMs, these markers were used to identify the utility of non-descript cattle of Maharashtra, India and found that many of these cattle were draft purpose and were aligning with their production performance. This information can further be used for taking breeding decisions for their grading up to dairy or draft cattle. Here, a novel pipeline which utilized [R-] reference panel, [A-] ancestry informative markers, [S-] selection signatures and the power of [EL-] ensemble machine learning for identifying and classifying the cattle, breed- and utility-wise, was developed, and we called it as RASEL (available at: https://github.com/kkokay07/RASEL ).
根据牛的品种和用途对不同牛群进行识别和分类,在有效的育种管理中具有极其重要的实际意义。为了准确识别和分类牛品种,使用了一个包含10个品种的参考面板、657个已识别的祖先信息标记和不同的机器学习分类器。为了提高品种识别的准确性,将三种不同的机器学习分类模型:逻辑回归、XGBoost和随机森林(每个模型的准确率均>95%)进行集成,仅使用207个标记[品种信息标记(BIMs)]就实现了>98%的准确率。此外,为了对奶牛和役用牛进行分类,探索了品种信息标记以及特定于奶牛和役用用途的选择特征中的标记,并基于集成方法确定了17个用途信息标记(UIMs),包括12个BIMs和5个选择特征中的标记。基于用途(奶牛或役用)对牛进行分类的准确率>96%。为了证明UIMs的应用,这些标记被用于识别印度马哈拉施特拉邦无特定品种牛的用途,发现其中许多牛是役用目的,并且与其生产性能相符。这些信息可进一步用于做出将它们升级为奶牛或役用牛的育种决策。在此,开发了一种新颖的流程,该流程利用[R-]参考面板、[A-]祖先信息标记、[S-]选择特征以及[EL-]集成机器学习的能力,按品种和用途对牛进行识别和分类,我们将其称为RASEL(可在https://github.com/kkokay07/RASEL获取)。