• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting.迈向语音、语言和听力科学中的通用机器学习模型:估计样本量并减少过拟合。
J Speech Lang Hear Res. 2024 Mar 11;67(3):753-781. doi: 10.1044/2023_JSLHR-23-00273. Epub 2024 Feb 22.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Machine learning algorithm validation with a limited sample size.机器学习算法在有限样本量下的验证。
PLoS One. 2019 Nov 7;14(11):e0224365. doi: 10.1371/journal.pone.0224365. eCollection 2019.
4
Consensus features nested cross-validation.共识特征嵌套交叉验证。
Bioinformatics. 2020 May 1;36(10):3093-3098. doi: 10.1093/bioinformatics/btaa046.
5
Machine Learning Model Validation for Early Stage Studies with Small Sample Sizes.机器学习模型在小样本量早期研究中的验证。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2314-2319. doi: 10.1109/EMBC46164.2021.9629697.
6
Essential Statistical Concepts for Research in Speech, Language, and Hearing Sciences.言语、语言和听力科学研究的基本统计概念。
J Speech Lang Hear Res. 2019 Mar 25;62(3):489-497. doi: 10.1044/2018_JSLHR-S-ASTM-18-0239.
7
Response to letter to the editor from Dr Rahman Shiri: The challenging topic of suicide across occupational groups.回复拉赫曼·希里博士的来信:职业群体中的自杀这一具有挑战性的话题。
Scand J Work Environ Health. 2018 Jan 1;44(1):108-110. doi: 10.5271/sjweh.3698. Epub 2017 Dec 8.
8
Preregistration: Practical Considerations for Speech, Language, and Hearing Research.预注册:言语、语言和听力研究的实际考虑因素。
J Speech Lang Hear Res. 2023 Jun 20;66(6):1889-1898. doi: 10.1044/2022_JSLHR-22-00317. Epub 2022 Dec 6.
9
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
10
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.

引用本文的文献

1
An assessment of optimizing biofuel yield percentage using K-fold integrated machine learning models for a sustainable future.使用K折集成机器学习模型评估优化生物燃料产量百分比以实现可持续未来。
PLoS One. 2025 Aug 14;20(8):e0328880. doi: 10.1371/journal.pone.0328880. eCollection 2025.
2
Comparative Evaluation of the Effectiveness and Efficiency of Computational Methods in the Detection of Asbestos Cement in Hyperspectral Images.高光谱图像中石棉水泥检测计算方法的有效性和效率的比较评估
Materials (Basel). 2025 Jul 23;18(15):3456. doi: 10.3390/ma18153456.
3
Predicting ICU Delirium in Critically Ill COVID-19 Patients Using Demographic, Clinical, and Laboratory Admission Data: A Machine Learning Approach.利用人口统计学、临床和实验室入院数据预测重症 COVID-19 患者的 ICU 谵妄:一种机器学习方法。
Life (Basel). 2025 Jun 30;15(7):1045. doi: 10.3390/life15071045.
4
Listening to the Mind: Integrating Vocal Biomarkers into Digital Health.倾听内心:将声音生物标志物整合到数字健康中。
Brain Sci. 2025 Jul 18;15(7):762. doi: 10.3390/brainsci15070762.
5
Mapping 74 years in acoustic analysis of voice disorders: A bibliometric review and future research directions.嗓音障碍声学分析74年图谱:文献计量学综述与未来研究方向
J Commun Disord. 2025 Jul 11;117:106555. doi: 10.1016/j.jcomdis.2025.106555.
6
Using ML techniques to predict extubation outcomes for patients with central nervous system injuries in the Yun-Gui Plateau.运用机器学习技术预测云贵高原中枢神经系统损伤患者的拔管结局。
Sci Rep. 2025 May 22;15(1):17773. doi: 10.1038/s41598-025-98861-9.
7
Machine learning in prediction of epidermal growth factor receptor status in non-small cell lung cancer brain metastases: a systematic review and meta-analysis.机器学习在预测非小细胞肺癌脑转移中表皮生长因子受体状态的应用:一项系统综述和荟萃分析
BMC Cancer. 2025 May 1;25(1):818. doi: 10.1186/s12885-025-14221-w.
8
Predicting pregnancy at the first year following metabolic-bariatric surgery: development and validation of machine learning models.代谢减重手术后第一年的妊娠预测:机器学习模型的开发与验证
Surg Endosc. 2025 Apr;39(4):2656-2667. doi: 10.1007/s00464-025-11640-5. Epub 2025 Mar 10.
9
Screening Voice Disorders: Acoustic Voice Quality Index, Cepstral Peak Prominence, and Machine Learning.嗓音障碍筛查:声学嗓音质量指数、谐波峰值突出度与机器学习
Folia Phoniatr Logop. 2025 Feb 21:1-15. doi: 10.1159/000544852.
10
A Comprehensive Review on the Application of Artificial Intelligence for Predicting Postsurgical Recurrence Risk in Early-Stage Non-Small Cell Lung Cancer Using Computed Tomography, Positron Emission Tomography, and Clinical Data.关于使用计算机断层扫描、正电子发射断层扫描和临床数据,应用人工智能预测早期非小细胞肺癌术后复发风险的综合综述。
J Med Radiat Sci. 2025 Jan 23. doi: 10.1002/jmrs.860.

本文引用的文献

1
Leakage and the reproducibility crisis in machine-learning-based science.基于机器学习的科学中的漏洞与可重复性危机。
Patterns (N Y). 2023 Aug 4;4(9):100804. doi: 10.1016/j.patter.2023.100804. eCollection 2023 Sep 8.
2
Detecting Mild Phonotrauma in Daily Life.日常生活中轻度声音创伤的检测。
Laryngoscope. 2023 Nov;133(11):3094-3099. doi: 10.1002/lary.30750. Epub 2023 May 17.
3
The Shape of Learning Curves: A Review.学习曲线的形态:综述
IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):7799-7819. doi: 10.1109/TPAMI.2022.3220744. Epub 2023 May 5.
4
Image representation of the acoustic signal: An effective tool for modeling spectral and temporal dynamics of connected speech.声信号的图像表示:建模连贯语音的频谱和时域动态的有效工具。
J Acoust Soc Am. 2022 Jul;152(1):580. doi: 10.1121/10.0012734.
5
Evaluation of Machine Learning Algorithms and Explainability Techniques to Detect Hearing Loss From a Speech-in-Noise Screening Test.基于语音噪声筛查测试评估用于检测听力损失的机器学习算法和可解释性技术
Am J Audiol. 2022 Sep 21;31(3S):961-979. doi: 10.1044/2022_AJA-21-00194. Epub 2022 Jul 25.
6
A machine-learning based objective measure for ALS disease severity.一种基于机器学习的肌萎缩侧索硬化症疾病严重程度客观测量方法。
NPJ Digit Med. 2022 Apr 8;5(1):45. doi: 10.1038/s41746-022-00588-8.
7
Detection of Vocal Fold Image Obstructions in High-Speed Videoendoscopy During Connected Speech in Adductor Spasmodic Dysphonia: A Convolutional Neural Networks Approach.基于卷积神经网络的痉挛性发声障碍患者连接性言语时高速视频内镜下声带图像遮挡的检测。
J Voice. 2024 Jul;38(4):951-962. doi: 10.1016/j.jvoice.2022.01.028. Epub 2022 Mar 16.
8
Machine Learning-Based Cry Diagnostic System for Identifying Septic Newborns.基于机器学习的新生儿败血症 Cry 诊断系统。
J Voice. 2024 Jul;38(4):963.e1-963.e14. doi: 10.1016/j.jvoice.2021.12.021. Epub 2022 Feb 19.
9
Digital medicine and the curse of dimensionality.数字医学与维度诅咒
NPJ Digit Med. 2021 Oct 28;4(1):153. doi: 10.1038/s41746-021-00521-5.
10
Estimation of Subglottal Pressure, Vocal Fold Collision Pressure, and Intrinsic Laryngeal Muscle Activation From Neck-Surface Vibration Using a Neural Network Framework and a Voice Production Model.使用神经网络框架和语音产生模型从颈部表面振动估计声门下压力、声带碰撞压力和喉内肌激活。
Front Physiol. 2021 Sep 1;12:732244. doi: 10.3389/fphys.2021.732244. eCollection 2021.

迈向语音、语言和听力科学中的通用机器学习模型:估计样本量并减少过拟合。

Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Estimating Sample Size and Reducing Overfitting.

作者信息

Ghasemzadeh Hamzeh, Hillman Robert E, Mehta Daryush D

机构信息

Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston.

Department of Surgery, Harvard Medical School, Boston, MA.

出版信息

J Speech Lang Hear Res. 2024 Mar 11;67(3):753-781. doi: 10.1044/2023_JSLHR-23-00273. Epub 2024 Feb 22.

DOI:10.1044/2023_JSLHR-23-00273
PMID:38386017
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11005022/
Abstract

PURPOSE

Many studies using machine learning (ML) in speech, language, and hearing sciences rely upon cross-validations with single data splitting. This study's first purpose is to provide quantitative evidence that would incentivize researchers to instead use the more robust data splitting method of nested -fold cross-validation. The second purpose is to present methods and MATLAB code to perform power analysis for ML-based analysis during the design of a study.

METHOD

First, the significant impact of different cross-validations on ML outcomes was demonstrated using real-world clinical data. Then, Monte Carlo simulations were used to quantify the interactions among the employed cross-validation method, the discriminative power of features, the dimensionality of the feature space, the dimensionality of the model, and the sample size. Four different cross-validation methods (single holdout, 10-fold, train-validation-test, and nested 10-fold) were compared based on the statistical power and confidence of the resulting ML models. Distributions of the null and alternative hypotheses were used to determine the minimum required sample size for obtaining a statistically significant outcome (5% significance) with 80% power. Statistical confidence of the model was defined as the probability of correct features being selected for inclusion in the final model.

RESULTS

ML models generated based on the single holdout method had very low statistical power and confidence, leading to overestimation of classification accuracy. Conversely, the nested 10-fold cross-validation method resulted in the highest statistical confidence and power while also providing an unbiased estimate of accuracy. The required sample size using the single holdout method could be 50% higher than what would be needed if nested -fold cross-validation were used. Statistical confidence in the model based on nested -fold cross-validation was as much as four times higher than the confidence obtained with the single holdout-based model. A computational model, MATLAB code, and lookup tables are provided to assist researchers with estimating the minimum sample size needed during study design.

CONCLUSION

The adoption of nested -fold cross-validation is critical for unbiased and robust ML studies in the speech, language, and hearing sciences.

SUPPLEMENTAL MATERIAL

https://doi.org/10.23641/asha.25237045.

摘要

目的

许多在语音、语言和听力科学中使用机器学习(ML)的研究依赖于单次数据分割的交叉验证。本研究的首要目的是提供定量证据,促使研究人员转而使用更稳健的数据分割方法——嵌套折交叉验证。第二个目的是在研究设计过程中,介绍用于基于ML分析的功效分析方法及MATLAB代码。

方法

首先,利用真实世界的临床数据证明不同交叉验证对ML结果的显著影响。然后,使用蒙特卡洛模拟来量化所采用的交叉验证方法、特征的判别能力、特征空间的维度、模型的维度和样本量之间的相互作用。基于所得ML模型的统计功效和置信度,比较了四种不同的交叉验证方法(单次留出法、10折法、训练-验证-测试法和嵌套10折法)。使用原假设和备择假设的分布来确定在功效为80%时获得具有统计学显著性结果(5%显著性水平)所需的最小样本量。模型的统计置信度定义为被选中纳入最终模型的正确特征的概率。

结果

基于单次留出法生成的ML模型统计功效和置信度非常低,导致分类准确率被高估。相反,嵌套10折交叉验证方法产生了最高的统计置信度和功效,同时还提供了无偏的准确率估计。使用单次留出法所需的样本量可能比使用嵌套折交叉验证法所需的样本量高50%。基于嵌套折交叉验证的模型的统计置信度比基于单次留出法的模型获得的置信度高出四倍之多。提供了一个计算模型、MATLAB代码和查找表,以帮助研究人员估计研究设计期间所需的最小样本量。

结论

采用嵌套折交叉验证对于语音、语言和听力科学中无偏且稳健的ML研究至关重要。

补充材料

https://doi.org/10.23641/asha.25237045 。