• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

美国动物科学学会-北美猪营养大会研讨会:动物营养中的数学建模:非正态多元分布的综合数据库生成:一种基于秩的方法及其在反刍动物甲烷排放中的应用

ASAS-NANP symposium: mathematical modeling in animal nutrition: synthetic database generation for non-normal multivariate distributions: a rank-based method with application to ruminant methane emissions.

作者信息

Tedeschi Luis O

机构信息

Department of Animal Science, Texas A&M University, College Station, TX, USA.

出版信息

J Anim Sci. 2025 Jan 4;103. doi: 10.1093/jas/skaf136.

DOI:10.1093/jas/skaf136
PMID:40319357
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12351256/
Abstract

This study addresses the challenge of limited data availability in animal science, particularly in modeling complex biological processes such as methane emissions from ruminants. We propose a novel rank-based method for generating synthetic databases with correlated non-normal multivariate distributions aimed at enhancing the accuracy and reliability of predictive modeling tools. Our rank-based approach involves a four-step process: 1) fitting distributions to variables using normal or best-fit non-normal distributions, 2) generating synthetic databases, 3) preserving relationships among variables using Spearman correlations, and 4) cleaning datasets to ensure biological plausibility. We compare this method with copula-based approaches to maintain a preestablished correlation structure. The rank-based method demonstrated superior performance in preserving original distribution moments (mean, variance, skewness, kurtosis) and correlation structures compared to copula-based methods. We generated two synthetic databases (normal and non-normal distributions) and applied random forest (RF) and multiple linear model (LM) regression analyses. RF regression outperformed LM in predicting methane emissions, showing higher R2 values (0.927 vs. 0.622) and lower standard errors. However, cross-testing revealed that RF regressions exhibit high specificity to distribution types, underperforming when applied to data with differing distributions. In contrast, LM regressions showed robustness across different distribution types. Our findings highlight the importance of understanding distributional assumptions in regression techniques when generating synthetic databases. The study also underscores the potential of synthetic data in augmenting limited samples, addressing class imbalances, and simulating rare scenarios. While our method effectively preserves descriptive statistical properties, we acknowledge the possibility of introducing artificial (unknown) relationships within subsets of the synthetic database. This research uncovered a practical solution for creating realistic, statistically sound datasets when original data is scarce or sensitive. Its application in predicting methane emissions demonstrates the potential to enhance modeling accuracy in animal science. Future research directions include integrating this approach with deep learning, exploring real-world applications, and developing adaptive machine-learning models for diverse data distributions.

摘要

本研究应对动物科学中数据可用性有限的挑战,特别是在对复杂生物过程(如反刍动物甲烷排放)进行建模时。我们提出了一种新颖的基于秩的方法,用于生成具有相关非正态多元分布的合成数据库,旨在提高预测建模工具的准确性和可靠性。我们基于秩的方法包括四个步骤:1)使用正态分布或最佳拟合非正态分布对变量进行分布拟合;2)生成合成数据库;3)使用斯皮尔曼相关性保留变量之间的关系;4)清理数据集以确保生物学合理性。我们将此方法与基于 copula 的方法进行比较,以维持预先建立的相关结构。与基于 copula 的方法相比,基于秩的方法在保留原始分布矩(均值、方差、偏度、峰度)和相关结构方面表现出卓越性能。我们生成了两个合成数据库(正态和非正态分布),并应用随机森林(RF)和多元线性模型(LM)回归分析。在预测甲烷排放方面,RF 回归优于 LM,显示出更高的 R2 值(0.927 对 0.622)和更低的标准误差。然而,交叉测试表明,RF 回归对分布类型具有高度特异性,应用于不同分布的数据时表现不佳。相比之下,LM 回归在不同分布类型中表现出稳健性。我们的研究结果突出了在生成合成数据库时理解回归技术中分布假设的重要性。该研究还强调了合成数据在扩充有限样本、解决类别不平衡以及模拟罕见场景方面的潜力。虽然我们的方法有效地保留了描述性统计属性,但我们承认在合成数据库子集中引入人为(未知)关系的可能性。这项研究在原始数据稀缺或敏感时,为创建现实、统计上合理的数据集找到了一个切实可行的解决方案。其在预测甲烷排放中的应用证明了提高动物科学建模准确性的潜力。未来的研究方向包括将此方法与深度学习相结合、探索实际应用,以及为不同数据分布开发自适应机器学习模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115e/12351256/6c637cf55ea6/skaf136_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115e/12351256/5441e575ffa1/skaf136_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115e/12351256/32322e6a0a7f/skaf136_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115e/12351256/6c637cf55ea6/skaf136_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115e/12351256/5441e575ffa1/skaf136_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115e/12351256/32322e6a0a7f/skaf136_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/115e/12351256/6c637cf55ea6/skaf136_fig3.jpg

相似文献

1
ASAS-NANP symposium: mathematical modeling in animal nutrition: synthetic database generation for non-normal multivariate distributions: a rank-based method with application to ruminant methane emissions.美国动物科学学会-北美猪营养大会研讨会:动物营养中的数学建模:非正态多元分布的综合数据库生成:一种基于秩的方法及其在反刍动物甲烷排放中的应用
J Anim Sci. 2025 Jan 4;103. doi: 10.1093/jas/skaf136.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Sexual Harassment and Prevention Training性骚扰与预防培训
4
Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义
APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.
5
Approaches for predicting dairy cattle methane emissions: from traditional methods to machine learning.预测奶牛甲烷排放的方法:从传统方法到机器学习。
J Anim Sci. 2024 Jan 3;102. doi: 10.1093/jas/skae219.
6
Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验:定性证据综合。
Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.
7
Does the Presence of Missing Data Affect the Performance of the SORG Machine-learning Algorithm for Patients With Spinal Metastasis? Development of an Internet Application Algorithm.缺失数据的存在是否会影响 SORG 机器学习算法在脊柱转移瘤患者中的性能?开发一种互联网应用算法。
Clin Orthop Relat Res. 2024 Jan 1;482(1):143-157. doi: 10.1097/CORR.0000000000002706. Epub 2023 Jun 12.
8
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
9
Are Current Survival Prediction Tools Useful When Treating Subsequent Skeletal-related Events From Bone Metastases?当前的生存预测工具在治疗骨转移后的骨骼相关事件时有用吗?
Clin Orthop Relat Res. 2024 Sep 1;482(9):1710-1721. doi: 10.1097/CORR.0000000000003030. Epub 2024 Mar 22.
10
Plug-and-play use of tree-based methods: consequences for clinical prediction modeling.基于树的方法的即插即用:对临床预测模型的影响。
J Clin Epidemiol. 2025 Aug;184:111834. doi: 10.1016/j.jclinepi.2025.111834. Epub 2025 May 19.

引用本文的文献

1
Development of Machine Learning-Based Sub-Models for Predicting Net Protein Requirements in Lactating Dairy Cows.基于机器学习的泌乳奶牛净蛋白质需求量预测子模型的开发
Animals (Basel). 2025 Jul 18;15(14):2127. doi: 10.3390/ani15142127.
2
ASAS-NANP SYMPOSIUM: Mathematical Modeling in Animal Nutrition: Training the Future Generation in Data and Predictive Analytics for Sustainable Development. A Summary of the 2023 Symposium.ASAS-NANP 研讨会:动物营养中的数学建模:培养数据与预测分析领域的未来一代以促进可持续发展。2023 年研讨会总结
J Anim Sci. 2025 May 4. doi: 10.1093/jas/skaf141.

本文引用的文献

1
Synthetic data generation by diffusion models.扩散模型生成合成数据。
Natl Sci Rev. 2024 Aug 24;11(8):nwae276. doi: 10.1093/nsr/nwae276. eCollection 2024 Aug.
2
Harnessing the power of synthetic data in healthcare: innovation, application, and privacy.利用合成数据在医疗保健领域的力量:创新、应用与隐私。
NPJ Digit Med. 2023 Oct 9;6(1):186. doi: 10.1038/s41746-023-00927-3.
3
Review: The prevailing mathematical modeling classifications and paradigms to support the advancement of sustainable animal production.综述:支持可持续动物生产发展的主流数学建模分类法和范例。
Animal. 2023 Dec;17 Suppl 5:100813. doi: 10.1016/j.animal.2023.100813. Epub 2023 Apr 17.
4
Galyean appreciation club review: a holistic perspective of the societal relevance of beef production and its impacts on climate change.加利恩欣赏俱乐部评论:从整体角度看待牛肉生产的社会相关性及其对气候变化的影响。
J Anim Sci. 2023 Jan 3;101. doi: 10.1093/jas/skad024.
5
Quantification of methane emitted by ruminants: a review of methods.反刍动物甲烷排放量的定量:方法综述。
J Anim Sci. 2022 Jul 1;100(7). doi: 10.1093/jas/skac197.
6
ASAS-NANP symposium: mathematical modeling in animal nutrition: the progression of data analytics and artificial intelligence in support of sustainable development in animal science.ASAS-NANP 研讨会:动物营养中的数学建模:数据分析和人工智能在支持动物科学可持续发展方面的进展。
J Anim Sci. 2022 Jun 1;100(6). doi: 10.1093/jas/skac111.
7
Review: Synergy between mechanistic modelling and data-driven models for modern animal production systems in the era of big data.综述:大数据时代下,机制模型与数据驱动模型在现代动物生产系统中的协同作用。
Animal. 2020 Aug;14(S2):s223-s237. doi: 10.1017/S1751731120000312. Epub 2020 Mar 6.
8
A method of generating multivariate non-normal random numbers with desired multivariate skewness and kurtosis.一种生成具有期望多元偏度和峰度的多元非正态随机数的方法。
Behav Res Methods. 2020 Jun;52(3):939-946. doi: 10.3758/s13428-019-01291-5.
9
ASN-ASAS SYMPOSIUM: FUTURE OF DATA ANALYTICS IN NUTRITION: Mathematical modeling in ruminant nutrition: approaches and paradigms, extant models, and thoughts for upcoming predictive analytics1,2.ASN-ASAS 研讨会:营养数据分析的未来:反刍动物营养中的数学建模:方法和范式、现有模型以及对即将到来的预测分析的思考 1,2.
J Anim Sci. 2019 Apr 29;97(5):1921-1944. doi: 10.1093/jas/skz092.
10
Symposium review: Uncertainties in enteric methane inventories, measurement techniques, and prediction models.研讨会综述:肠道甲烷清单、测量技术和预测模型中的不确定性。
J Dairy Sci. 2018 Jul;101(7):6655-6674. doi: 10.3168/jds.2017-13536. Epub 2018 Apr 19.