Miller D Douglas
Medical College of Georgia (GB 3330), Augusta, GA 30912 USA.
NPJ Digit Med. 2019 Jun 28;2:62. doi: 10.1038/s41746-019-0138-5. eCollection 2019.
Machine learning (ML) and its parent technology trend, artificial intelligence (AI), are deriving novel insights from ever larger and more complex datasets. Efficient and accurate AI analytics require fastidious data science-the careful curating of knowledge representations in databases, decomposition of data matrices to reduce dimensionality, and preprocessing of datasets to mitigate the confounding effects of messy (i.e., missing, redundant, and outlier) data. Messier, bigger and more dynamic medical datasets create the potential for ML computing systems querying databases to draw erroneous data inferences, portending real-world human health consequences. High-dimensional medical datasets can be static or dynamic. For example, principal component analysis (PCA) used within R computing packages can speed & scale disease association analytics for deriving polygenic risk scores from static gene-expression microarrays. Robust PCA of -dimensional subspace data accelerates image acquisition and reconstruction of dynamic 4-D magnetic resonance imaging studies, enhancing tracking of organ physiology, tissue relaxation parameters, and contrast agent effects. Unlike other data-dense business and scientific sectors, medical AI users must be aware that input data quality limitations can have health implications, potentially reducing analytic model accuracy for predicting clinical disease risks and patient outcomes. As AI technologies find more health applications, physicians should contribute their health domain expertize to rules-/ML-based computer system development, inform input data provenance and recognize the importance of data preprocessing quality assurance interpreting the clinical implications of intelligent machine outputs to patients.
机器学习(ML)及其母体技术趋势——人工智能(AI),正从越来越大、越来越复杂的数据集中获取新颖的见解。高效且准确的AI分析需要严谨的数据科学——精心整理数据库中的知识表示、分解数据矩阵以降低维度,以及对数据集进行预处理以减轻杂乱(即缺失、冗余和异常值)数据的混杂影响。更杂乱、更大且更具动态性的医学数据集使得ML计算系统查询数据库时有可能得出错误的数据推断,这预示着会对现实世界中的人类健康产生影响。高维医学数据集可以是静态的或动态的。例如,R计算包中使用的主成分分析(PCA)可以加快并扩大疾病关联分析的速度,以便从静态基因表达微阵列中得出多基因风险评分。对高维子空间数据进行稳健的PCA可加速动态四维磁共振成像研究的图像采集和重建,增强对器官生理学、组织弛豫参数和造影剂效果的跟踪。与其他数据密集型商业和科学领域不同,医学AI用户必须意识到输入数据质量的局限性可能会对健康产生影响,这可能会降低预测临床疾病风险和患者预后的分析模型的准确性。随着AI技术在医疗领域有更多应用,医生应将其在健康领域的专业知识贡献于基于规则/ML的计算机系统开发,告知输入数据的来源,并认识到数据预处理质量保证的重要性,向患者解释智能机器输出结果的临床意义。
Front Psychol. 2023-1-17
Cardiol Rev. 2020
Biochim Biophys Acta Rev Cancer. 2021-8
Am J Med. 2017-11-7
J Med Internet Res. 2024-10-30
PLOS Digit Health. 2024-7-24
Int J Comput Assist Radiol Surg. 2023-10
NPJ Digit Med. 2019-1-29
NPJ Digit Med. 2018-5-8
AJR Am J Roentgenol. 2018-11-13
Proc IEEE Inst Electr Electron Eng. 2018-8
Am J Med. 2018-11
J Chem Inf Model. 2018-6-12
Bioinformatics. 2018-8-15
N Engl J Med. 2018-3-15
Am J Med. 2017-11-7
AMIA Annu Symp Proc. 2015-11-5