Suppr超能文献

利用基因组信息和机器学习对虹鳟进行性别鉴定。

Sex identification in rainbow trout using genomic information and machine learning.

作者信息

Kudinov Andrei A, Kause Antti

机构信息

Natural Resources Institute Finland, 31600, Jokioinen, Finland.

出版信息

Genet Sel Evol. 2024 Dec 30;56(1):79. doi: 10.1186/s12711-024-00944-0.

Abstract

Sex identification in farmed fish is important for the management of fish stocks and breeding programs, but identification based on visual characteristics is typically difficult or impossible in juvenile or premature fish. The amount of genomic data obtained from farmed fish is rapidly growing with the implementation of genomic selection in aquaculture. In comparison to mammals and birds, ray-finned fishes exhibit a greater diversity of sex determination systems, with an absence of conserved genomic regions. A group of genomic markers located on a standard genotyping array has been reported to potentially be linked with sex determination in rainbow trout. However, the set of markers suitable for sex identification may vary between populations. Sex identification from genomic data is usually performed using probabilistic methods, where suitable markers are known beforehand. In our study, we demonstrated the use of the Extreme Gradient Boosting approach from the supervised machine learning gradient boost framework to predict sex from unimputed genomic data, when the suitability of the markers was unknown a priori. The accuracy of the method was assessed using four simulated datasets with different genotyping error rates and one real dataset from the Finnish Rainbow Trout Breeding Program. The method showed high prediction quality on both simulated and real datasets. For simulated datasets with low (5%) and high (50%) genotyping error rates, the accuracies were 1.0 and 0.60, respectively. In the real data, the method achieved a prediction accuracy of 98%, which is suitable for routine use.

摘要

养殖鱼类的性别鉴定对于鱼类种群管理和育种计划至关重要,但基于视觉特征的鉴定在幼鱼或未成熟鱼中通常很难做到或根本无法实现。随着水产养殖中基因组选择的实施,从养殖鱼类获得的基因组数据量正在迅速增长。与哺乳动物和鸟类相比,硬骨鱼的性别决定系统具有更大的多样性,且不存在保守的基因组区域。据报道,位于标准基因分型阵列上的一组基因组标记可能与虹鳟鱼的性别决定有关。然而,适用于性别鉴定的标记集可能因种群而异。从基因组数据中进行性别鉴定通常使用概率方法,其中合适的标记是预先已知的。在我们的研究中,当标记的适用性事先未知时,我们展示了使用监督机器学习梯度提升框架中的极端梯度提升方法从未估算的基因组数据中预测性别。使用四个具有不同基因分型错误率的模拟数据集和一个来自芬兰虹鳟鱼育种计划的真实数据集评估了该方法的准确性。该方法在模拟数据集和真实数据集上均显示出较高的预测质量。对于基因分型错误率低(5%)和高(50%)的模拟数据集,准确率分别为1.0和0.60。在真实数据中,该方法实现了98%的预测准确率,适用于常规使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验