• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在大数据处理个性化医疗时将数据置于算法之前。

Putting the data before the algorithm in big data addressing personalized healthcare.

作者信息

Cahan Eli M, Hernandez-Boussard Tina, Thadaney-Israni Sonoo, Rubin Daniel L

机构信息

1New York University School of Medicine, New York, NY USA.

2Department of Pediatric Orthopaedics, Stanford University, Palo Alto, CA USA.

出版信息

NPJ Digit Med. 2019 Aug 19;2:78. doi: 10.1038/s41746-019-0157-2. eCollection 2019.

DOI:10.1038/s41746-019-0157-2
PMID:31453373
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6700078/
Abstract

Technologies leveraging big data, including predictive algorithms and machine learning, are playing an increasingly important role in the delivery of healthcare. However, evidence indicates that such algorithms have the potential to worsen disparities currently intrinsic to the contemporary healthcare system, including racial biases. Blame for these deficiencies has often been placed on the algorithm-but the underlying training data bears greater responsibility for these errors, as biased outputs are inexorably produced by biased inputs. The utility, equity, and generalizability of predictive models depend on population-representative training data with robust feature sets. So while the conventional paradigm of big data is deductive in nature-clinical decision support-a future model harnesses the potential of big data for inductive reasoning. This may be conceptualized as clinical decision questioning, intended to liberate the human predictive process from preconceived lenses in data solicitation and/or interpretation. Efficacy, representativeness and generalizability are all heightened in this schema. Thus, the possible risks of biased big data arising from the inputs themselves must be acknowledged and addressed. Awareness of data deficiencies, structures for data inclusiveness, strategies for data sanitation, and mechanisms for data correction can help realize the potential of big data for a personalized medicine era. Applied deliberately, these considerations could help mitigate risks of perpetuation of health inequity amidst widespread adoption of novel applications of big data.

摘要

利用大数据的技术,包括预测算法和机器学习,在医疗保健服务中发挥着越来越重要的作用。然而,有证据表明,此类算法有可能加剧当代医疗系统目前固有的不平等现象,包括种族偏见。这些缺陷往往被归咎于算法,但基础训练数据对这些错误负有更大责任,因为有偏差的输入必然会产生有偏差的输出。预测模型的效用、公平性和通用性取决于具有强大特征集的具有人群代表性的训练数据。因此,虽然大数据的传统范式本质上是演绎性的——临床决策支持——但未来的模型将利用大数据进行归纳推理的潜力。这可以被概念化为临床决策质疑,旨在使人类预测过程在数据收集和/或解释中摆脱先入为主的观念。在这种模式下,有效性、代表性和通用性都得到了提高。因此,必须认识并解决由输入本身产生的有偏差大数据的潜在风险。对数据缺陷的认识、数据包容性的结构、数据清理的策略以及数据校正的机制,有助于实现大数据在个性化医疗时代的潜力。谨慎应用这些考量因素,有助于在广泛采用大数据新应用的过程中,降低健康不平等持续存在的风险。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f13/6700078/eb9ed534aa3f/41746_2019_157_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f13/6700078/eb9ed534aa3f/41746_2019_157_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f13/6700078/eb9ed534aa3f/41746_2019_157_Fig1_HTML.jpg

相似文献

1
Putting the data before the algorithm in big data addressing personalized healthcare.在大数据处理个性化医疗时将数据置于算法之前。
NPJ Digit Med. 2019 Aug 19;2:78. doi: 10.1038/s41746-019-0157-2. eCollection 2019.
2
MLBCD: a machine learning tool for big clinical data.MLBCD:用于大临床数据的机器学习工具。
Health Inf Sci Syst. 2015 Sep 28;3:3. doi: 10.1186/s13755-015-0011-0. eCollection 2015.
3
Empowering Personalized Medicine with Big Data and Semantic Web Technology: Promises, Challenges, and Use Cases.利用大数据和语义网技术推动个性化医疗:前景、挑战与应用案例
Proc IEEE Int Conf Big Data. 2014 Oct;2014:790-795. doi: 10.1109/BigData.2014.7004307.
4
PredicT-ML: a tool for automating machine learning model building with big clinical data.PredicT-ML:一个利用大型临床数据自动化机器学习模型构建的工具。
Health Inf Sci Syst. 2016 Jun 8;4:5. doi: 10.1186/s13755-016-0018-1. eCollection 2016.
5
Decision-Making based on Big Data Analytics for People Management in Healthcare Organizations.基于大数据分析的医疗保健组织人员管理决策。
J Med Syst. 2019 Jul 22;43(9):290. doi: 10.1007/s10916-019-1419-x.
6
m-Health 2.0: New perspectives on mobile health, machine learning and big data analytics.移动医疗 2.0:移动医疗、机器学习和大数据分析的新视角。
Methods. 2018 Dec 1;151:34-40. doi: 10.1016/j.ymeth.2018.05.015. Epub 2018 Jun 8.
7
Transforming Healthcare Delivery: Integrating Dynamic Simulation Modelling and Big Data in Health Economics and Outcomes Research.变革医疗服务提供:将动态模拟建模与大数据整合于卫生经济学和结果研究中。
Pharmacoeconomics. 2016 Feb;34(2):115-26. doi: 10.1007/s40273-015-0330-7.
8
Variance Reduction in Neurosurgical Practice: The Case for Analytics-Driven Decision Support in the Era of Big Data.神经外科学实践中的方差减少:大数据时代分析驱动决策支持的案例。
World Neurosurg. 2019 Jun;126:e190-e195. doi: 10.1016/j.wneu.2019.01.292. Epub 2019 Feb 22.
9
Avoiding and identifying errors in health technology assessment models: qualitative study and methodological review.避免和识别健康技术评估模型中的错误:定性研究和方法学综述。
Health Technol Assess. 2010 May;14(25):iii-iv, ix-xii, 1-107. doi: 10.3310/hta14250.
10
Cognitive IT-systems for big data analysis in medicine.用于医学大数据分析的认知信息技术系统。
Int J Risk Saf Med. 2015;27 Suppl 1:S108-9. doi: 10.3233/JRS-150711.

引用本文的文献

1
The ethics of data mining in healthcare: challenges, frameworks, and future directions.医疗保健领域数据挖掘的伦理问题:挑战、框架及未来方向。
BioData Min. 2025 Jul 11;18(1):47. doi: 10.1186/s13040-025-00461-w.
2
Advancing Musculoskeletal Care Using AI and Digital Health Applications: A Review of Commercial Solutions.利用人工智能和数字健康应用推进肌肉骨骼护理:商业解决方案综述
HSS J. 2025 May 30:15563316251341321. doi: 10.1177/15563316251341321.
3
AI-Guided Delineation of Gross Tumor Volume for Body Tumors: A Systematic Review.人工智能引导下的体部肿瘤大体肿瘤体积勾画:一项系统综述。

本文引用的文献

1
Feasibility of Reidentifying Individuals in Large National Physical Activity Data Sets From Which Protected Health Information Has Been Removed With Use of Machine Learning.利用机器学习对已去除保护健康信息的大型国家体力活动数据集进行重新识别个体的可行性。
JAMA Netw Open. 2018 Dec 7;1(8):e186040. doi: 10.1001/jamanetworkopen.2018.6040.
2
TUSKEGEE AND THE HEALTH OF BLACK MEN.塔斯基吉与黑人男性健康
Q J Econ. 2018 Feb;133(1):407-455. doi: 10.1093/qje/qjx029. Epub 2017 Aug 2.
3
Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study.
Diagnostics (Basel). 2025 Mar 26;15(7):846. doi: 10.3390/diagnostics15070846.
4
Advancing the diagnosis of major depressive disorder: Integrating neuroimaging and machine learning.推进重度抑郁症的诊断:整合神经影像学与机器学习
World J Psychiatry. 2025 Mar 19;15(3):103321. doi: 10.5498/wjp.v15.i3.103321.
5
Barriers to Recruitment and Retention Among Underrepresented Populations in Cancer Clinical Trials: A Qualitative Study of the Perspectives of Clinical Trial Research Coordinating Staff at a Cancer Center.癌症临床试验中代表性不足人群的招募与留用障碍:对某癌症中心临床试验研究协调人员观点的定性研究
J Healthc Leadersh. 2024 Nov 1;16:427-441. doi: 10.2147/JHL.S488426. eCollection 2024.
6
Ancestry-associated co-alteration landscape of KRAS and EGFR-altered non-squamous NSCLC.KRAS和EGFR改变的非鳞状非小细胞肺癌的祖先相关共改变图谱。
NPJ Precis Oncol. 2024 Jul 20;8(1):153. doi: 10.1038/s41698-024-00644-4.
7
Federated-learning-based prognosis assessment model for acute pulmonary thromboembolism.基于联邦学习的急性肺血栓栓塞症预后评估模型。
BMC Med Inform Decis Mak. 2024 May 27;24(1):141. doi: 10.1186/s12911-024-02543-x.
8
Generalizability of a Musculoskeletal Therapist Electronic Health Record for Modelling Outcomes to Work-Related Musculoskeletal Disorders.用于对与工作相关的肌肉骨骼疾病结局进行建模的肌肉骨骼治疗师电子健康记录的可推广性。
J Occup Rehabil. 2025 Mar;35(1):125-138. doi: 10.1007/s10926-024-10196-w. Epub 2024 May 13.
9
Development of an Automatic Rule-Based Algorithm for the Detection of Ovarian Cancer Recurrence From Electronic Health Records.基于规则的自动算法在电子病历中卵巢癌复发检测的开发。
JCO Clin Cancer Inform. 2024 Mar;8:e2300150. doi: 10.1200/CCI.23.00150.
10
AI-enabled organoids: Construction, analysis, and application.人工智能驱动的类器官:构建、分析与应用。
Bioact Mater. 2023 Sep 16;31:525-548. doi: 10.1016/j.bioactmat.2023.09.005. eCollection 2024 Jan.
深度学习模型检测胸片肺炎的可变泛化性能:一项横断面研究。
PLoS Med. 2018 Nov 6;15(11):e1002683. doi: 10.1371/journal.pmed.1002683. eCollection 2018 Nov.
4
Machine learning in medicine: Addressing ethical challenges.机器学习在医学中的应用:应对伦理挑战。
PLoS Med. 2018 Nov 6;15(11):e1002689. doi: 10.1371/journal.pmed.1002689. eCollection 2018 Nov.
5
Including Phenotypic Causal Networks in Genome-Wide Association Studies Using Mixed Effects Structural Equation Models.使用混合效应结构方程模型将表型因果网络纳入全基因组关联研究
Front Genet. 2018 Oct 9;9:455. doi: 10.3389/fgene.2018.00455. eCollection 2018.
6
The clinical imperative for inclusivity: Race, ethnicity, and ancestry (REA) in genomics.临床需要包容性:基因组学中的种族、民族和血统(REA)。
Hum Mutat. 2018 Nov;39(11):1713-1720. doi: 10.1002/humu.23644.
7
Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data.利用电子健康记录数据的机器学习算法中的潜在偏差。
JAMA Intern Med. 2018 Nov 1;178(11):1544-1547. doi: 10.1001/jamainternmed.2018.3763.
8
AI can be sexist and racist - it's time to make it fair.人工智能可能存在性别歧视和种族歧视——是时候让它变得公平了。
Nature. 2018 Jul;559(7714):324-326. doi: 10.1038/d41586-018-05707-8.
9
Machine Learning-Augmented Propensity Score-Adjusted Multilevel Mixed Effects Panel Analysis of Hands-On Cooking and Nutrition Education versus Traditional Curriculum for Medical Students as Preventive Cardiology: Multisite Cohort Study of 3,248 Trainees over 5 Years.基于机器学习的倾向评分调整多层次混合效应面板分析:医学生实践烹饪与营养教育与传统课程在预防心脏病学中的效果——5 年 3248 名学员的多站点队列研究。
Biomed Res Int. 2018 Apr 15;2018:5051289. doi: 10.1155/2018/5051289. eCollection 2018.
10
Exploring patterns enriched in a dataset with contrastive principal component analysis.用对比主成分分析探索数据集内的模式富集。
Nat Commun. 2018 May 30;9(1):2134. doi: 10.1038/s41467-018-04608-8.