Wei Jing, Xiang Jie, Yasin Yousef, Barszczyk Andrew, Wah Deanne Tak On, Yu Meifen, Huang Wendy Wenyu, Feng Zhong-Ping, Lee Kang, Luo Hong
The Affiliated Hospital of Hangzhou Normal University, Hangzhou Normal University. Hangzhou, Zhejiang, People's Republic of China.
Department of Applied Psychology and Human Development, Ontario Institute for Studies in Education, University of Toronto, Toronto, Ontario, Canada.
Asian Pac J Cancer Prev. 2021 Feb 1;22(2):333-340. doi: 10.31557/APJCP.2021.22.2.333.
Serum protein concentrations are diagnostically and prognostically valuable in cancer and other diseases, but their measurement via blood test is uncomfortable, inconvenient, and costly. This study investigates the possibility of predicting albumin, globulin, and albumin-globulin ratio from easily accessible physical characteristics (height, weight, Body Mass Index, age, gender) and vital signs (systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, pulse) using advanced machine learning techniques.
We obtained albumin concentration, globulin concentration, albumin-globulin ratio and predictor information (physical characteristics, vital signs) from physical exam records of 46,951 healthy adult participants in Hangzhou, China. We trained a computational model to predict each serum protein concentration from the predictors and then evaluated the predictive accuracy of each model on an independent portion of the dataset that was not used in model training. We also determined the relative importance of each feature within the model.
Prediction accuracies were r=0.540 (95% CI: 0.539-0.540; Pearson r) for albumin, r=0.250 (95% CI: 0.249-0.251) for globulin, and r=0.373 (95% CI: 0.372-0.374) for albumin-globulin ratio. The most important predictive features were age (100% ± 0.0%; mean ± 95% CI of normalized importance), gender (34.4% ± 0.7%), pulse (25.6% ± 1.3%) and Body Mass Index (24.4% ± 2.3%) for albumin, pulse (83.7% ± 3.8%) for globulin, and age (99.2% ± 1.0%), gender (59.2% ± 1.7%), Body Mass Index (46.1% ± 4.2%) and height (40.0% ± 3.8%) for albumin-globulin ratio.
Our models predicted serum protein concentrations with appreciable accuracy showing the promise of this approach. Such models could serve to augment existing tools for identifying "at-risk" individuals for follow-up with a blood test.
血清蛋白浓度在癌症及其他疾病的诊断和预后评估中具有重要价值,但其通过血液检测进行测量既不舒适、不方便,成本也高。本研究利用先进的机器学习技术,探讨从易于获取的身体特征(身高、体重、体重指数、年龄、性别)和生命体征(收缩压、舒张压、平均动脉压、脉压、脉搏)预测白蛋白、球蛋白及白蛋白-球蛋白比值的可能性。
我们从中国杭州46951名健康成年参与者的体检记录中获取了白蛋白浓度、球蛋白浓度、白蛋白-球蛋白比值及预测指标信息(身体特征、生命体征)。我们训练了一个计算模型,根据这些预测指标来预测每种血清蛋白浓度,然后在未用于模型训练的独立数据集部分评估每个模型的预测准确性。我们还确定了模型中每个特征的相对重要性。
白蛋白的预测准确率为r = 0.540(95%置信区间:0.539 - 0.540;皮尔逊相关系数r),球蛋白为r = 0.250(95%置信区间:0.249 - 0.251),白蛋白-球蛋白比值为r = 0.373(95%置信区间:0.372 - 0.374)。对于白蛋白,最重要的预测特征是年龄(100% ± 0.0%;标准化重要性的均值 ± 95%置信区间)、性别(34.4% ± 0.7%)、脉搏(25.6% ± 1.3%)和体重指数(24.4% ± 2.3%);对于球蛋白,是脉搏(83.7% ± 3.8%);对于白蛋白-球蛋白比值,是年龄(99.2% ± 1.0%)、性别(59.2% ± 1.7%)、体重指数(46.1% ± 4.2%)和身高(40.0% ± 3.8%)。
我们的模型以可观的准确率预测了血清蛋白浓度,显示出这种方法的前景。此类模型可用于增强现有工具,以识别需要进行血液检测随访的“高危”个体。