• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

表格深度学习:应用于多任务全基因组预测的比较研究。

Tabular deep learning: a comparative study applied to multi-task genome-wide prediction.

机构信息

Research Unit of Mathematical Sciences, University of Oulu, P.O. Box 8000, 90014, Univesity of Oulu, Finland.

出版信息

BMC Bioinformatics. 2024 Oct 4;25(1):322. doi: 10.1186/s12859-024-05940-1.

DOI:10.1186/s12859-024-05940-1
PMID:39367318
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11452967/
Abstract

PURPOSE

More accurate prediction of phenotype traits can increase the success of genomic selection in both plant and animal breeding studies and provide more reliable disease risk prediction in humans. Traditional approaches typically use regression models based on linear assumptions between the genetic markers and the traits of interest. Non-linear models have been considered as an alternative tool for modeling genomic interactions (i.e. non-additive effects) and other subtle non-linear patterns between markers and phenotype. Deep learning has become a state-of-the-art non-linear prediction method for sound, image and language data. However, genomic data is better represented in a tabular format. The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports successful results on various datasets. Tabular deep learning applications in genome-wide prediction (GWP) are still rare. In this work, we perform an overview of the main families of recent deep learning architectures for tabular data and apply them to multi-trait regression and multi-class classification for GWP on real gene datasets.

METHODS

The study involves an extensive overview of recent deep learning architectures for tabular data learning: NODE, TabNet, TabR, TabTransformer, FT-Transformer, AutoInt, GANDALF, SAINT and LassoNet. These architectures are applied to multi-trait GWP. Comprehensive benchmarks of various tabular deep learning methods are conducted to identify best practices and determine their effectiveness compared to traditional methods.

RESULTS

Extensive experimental results on several genomic datasets (three for multi-trait regression and two for multi-class classification) highlight LassoNet as a standout performer, surpassing both other tabular deep learning models and the highly efficient tree based LightGBM method in terms of both best prediction accuracy and computing efficiency.

CONCLUSION

Through series of evaluations on real-world genomic datasets, the study identifies LassoNet as a standout performer, surpassing decision tree methods like LightGBM and other tabular deep learning architectures in terms of both predictive accuracy and computing efficiency. Moreover, the inherent variable selection property of LassoNet provides a systematic way to find important genetic markers that contribute to phenotype expression.

摘要

目的

更准确地预测表型特征可以提高基因组选择在植物和动物育种研究中的成功率,并为人类提供更可靠的疾病风险预测。传统方法通常使用基于遗传标记与感兴趣性状之间线性假设的回归模型。非线性模型已被视为建模基因组相互作用(即非加性效应)和标记与表型之间其他微妙非线性模式的替代工具。深度学习已成为声音、图像和语言数据的一种先进的非线性预测方法。然而,基因组数据以表格形式表示更好。关于表格数据深度学习的现有文献提出了广泛的新型架构,并在各种数据集上报告了成功的结果。基因组预测(GWP)中表格深度学习的应用仍然很少。在这项工作中,我们对表格数据深度学习的主要架构家族进行了概述,并将其应用于真实基因数据集上的多性状回归和多类分类的 GWP。

方法

该研究涉及对表格数据学习的最新深度学习架构的广泛概述:NODE、TabNet、TabR、TabTransformer、FT-Transformer、AutoInt、GANDALF、SAINT 和 LassoNet。这些架构应用于多性状 GWP。对各种表格深度学习方法进行了全面的基准测试,以确定最佳实践,并确定它们与传统方法相比的有效性。

结果

在几个基因组数据集(三个用于多性状回归,两个用于多类分类)上的广泛实验结果突出了 LassoNet 的出色表现,在预测准确性和计算效率方面均优于其他表格深度学习模型和高效的基于树的 LightGBM 方法。

结论

通过对真实基因组数据集的一系列评估,该研究确定 LassoNet 是一种出色的表现者,在预测准确性和计算效率方面均优于决策树方法(如 LightGBM)和其他表格深度学习架构。此外,LassoNet 的固有变量选择特性提供了一种系统的方法来找到对表型表达有贡献的重要遗传标记。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5268/11452967/8ea22d38fd6e/12859_2024_5940_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5268/11452967/c7c03041388a/12859_2024_5940_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5268/11452967/8ea22d38fd6e/12859_2024_5940_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5268/11452967/c7c03041388a/12859_2024_5940_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5268/11452967/8ea22d38fd6e/12859_2024_5940_Fig2_HTML.jpg

相似文献

1
Tabular deep learning: a comparative study applied to multi-task genome-wide prediction.表格深度学习:应用于多任务全基因组预测的比较研究。
BMC Bioinformatics. 2024 Oct 4;25(1):322. doi: 10.1186/s12859-024-05940-1.
2
DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants.DNNGP,一种基于深度神经网络的方法,用于利用植物中的多组学数据进行基因组预测。
Mol Plant. 2023 Jan 2;16(1):279-293. doi: 10.1016/j.molp.2022.11.004. Epub 2022 Nov 10.
3
Deep Neural Networks and Tabular Data: A Survey.深度神经网络与表格数据:一项综述。
IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):7499-7519. doi: 10.1109/TNNLS.2022.3229161. Epub 2024 Jun 3.
4
DeepCGP: A Deep Learning Method to Compress Genome-Wide Polymorphisms for Predicting Phenotype of Rice.DeepCGP:一种用于压缩全基因组多态性以预测水稻表型的深度学习方法。
IEEE/ACM Trans Comput Biol Bioinform. 2023 May-Jun;20(3):2078-2088. doi: 10.1109/TCBB.2022.3231466. Epub 2023 Jun 5.
5
Regularized multi-trait multi-locus linear mixed models for genome-wide association studies and genomic selection in crops.作物全基因组关联研究和基因组选择的正则化多性状多基因座线性混合模型。
BMC Bioinformatics. 2023 Oct 26;24(1):399. doi: 10.1186/s12859-023-05519-2.
6
New Deep Learning Genomic-Based Prediction Model for Multiple Traits with Binary, Ordinal, and Continuous Phenotypes.基于深度学习的基因组的新型预测模型,用于具有二元、有序和连续表型的多个特征。
G3 (Bethesda). 2019 May 7;9(5):1545-1556. doi: 10.1534/g3.119.300585.
7
Genomic prediction using machine learning: a comparison of the performance of regularized regression, ensemble, instance-based and deep learning methods on synthetic and empirical data.基于机器学习的基因组预测:在合成数据和实际数据上,正则化回归、集成、基于实例和深度学习方法的性能比较。
BMC Genomics. 2024 Feb 7;25(1):152. doi: 10.1186/s12864-023-09933-x.
8
Deep learning versus parametric and ensemble methods for genomic prediction of complex phenotypes.深度学习与参数化和集成方法在复杂表型基因组预测中的比较。
Genet Sel Evol. 2020 Feb 24;52(1):12. doi: 10.1186/s12711-020-00531-z.
9
Genome-wide prediction using Bayesian additive regression trees.使用贝叶斯加法回归树进行全基因组预测。
Genet Sel Evol. 2016 Jun 10;48(1):42. doi: 10.1186/s12711-016-0219-8.
10
A review of deep learning applications for genomic selection.深度学习在基因组选择中的应用综述。
BMC Genomics. 2021 Jan 6;22(1):19. doi: 10.1186/s12864-020-07319-x.

引用本文的文献

1
Comparison of Artificial Intelligence Models Using CT Radiomics for Predicting Post-Vertebral Augmentation Residual Back Pain in Osteoporotic Vertebral Compression Fractures.使用CT影像组学的人工智能模型预测骨质疏松性椎体压缩骨折椎体强化术后残余背痛的比较
Int J Med Sci. 2025 Jul 11;22(13):3329-3341. doi: 10.7150/ijms.114419. eCollection 2025.
2
Multi-task genomic prediction using gated residual variable selection neural networks.使用门控残差变量选择神经网络的多任务基因组预测
BMC Bioinformatics. 2025 Jul 7;26(1):167. doi: 10.1186/s12859-025-06188-z.

本文引用的文献

1
Genome-wide association study and high-quality gene mining related to soybean protein and fat.大豆蛋白和脂肪相关的全基因组关联研究和高质量基因挖掘。
BMC Genomics. 2023 Oct 7;24(1):596. doi: 10.1186/s12864-023-09687-6.
2
deepGBLUP: joint deep learning networks and GBLUP framework for accurate genomic prediction of complex traits in Korean native cattle.深度 GBLUP:联合深度学习网络和 GBLUP 框架,用于准确预测韩国本土牛复杂性状的基因组。
Genet Sel Evol. 2023 Jul 31;55(1):56. doi: 10.1186/s12711-023-00825-y.
3
Deep Neural Networks and Tabular Data: A Survey.
深度神经网络与表格数据:一项综述。
IEEE Trans Neural Netw Learn Syst. 2024 Jun;35(6):7499-7519. doi: 10.1109/TNNLS.2022.3229161. Epub 2024 Jun 3.
4
Crop genomic selection with deep learning and environmental data: A survey.利用深度学习和环境数据的作物基因组选择:一项综述。
Front Artif Intell. 2023 Jan 10;5:1040295. doi: 10.3389/frai.2022.1040295. eCollection 2022.
5
LassoNet: Neural Networks with Feature Sparsity.套索网络:具有特征稀疏性的神经网络。
Proc Mach Learn Res. 2021 Apr;130:10-18.
6
NeuralLasso: Neural Networks Meet Lasso in Genomic Prediction.神经套索算法:基因组预测中神经网络与套索算法的结合
Front Plant Sci. 2022 Apr 29;13:800161. doi: 10.3389/fpls.2022.800161. eCollection 2022.
7
GPTransformer: A Transformer-Based Deep Learning Method for Predicting Fusarium Related Traits in Barley.GPTransformer:一种基于Transformer的深度学习方法,用于预测大麦中与镰刀菌相关的性状。
Front Plant Sci. 2021 Dec 16;12:761402. doi: 10.3389/fpls.2021.761402. eCollection 2021.
8
Interpretable machine learning for genomics.基因组学可解释的机器学习。
Hum Genet. 2022 Sep;141(9):1499-1513. doi: 10.1007/s00439-021-02387-9. Epub 2021 Oct 20.
9
Multi-omics prediction of oat agronomic and seed nutritional traits across environments and in distantly related populations.多组学预测 oat 农艺和种子营养特性在不同环境和远缘群体中的表现。
Theor Appl Genet. 2021 Dec;134(12):4043-4054. doi: 10.1007/s00122-021-03946-4. Epub 2021 Oct 13.
10
Multitrait machine- and deep-learning models for genomic selection using spectral information in a wheat breeding program.利用小麦育种计划中的光谱信息,基于多种性状的机器和深度学习模型进行基因组选择。
Plant Genome. 2021 Nov;14(3):e20119. doi: 10.1002/tpg2.20119. Epub 2021 Sep 5.