Kodera Satomi, Yokoi Osamu, Kaneko Masaki, Sato Yuka, Ito Susumu, Hata Katsuhiko
KYB Medical Service Co., LTD, Tokyo, Japan.
Department of Neuroscience, Research Centre for Mathematical Medicine, Tokyo, Japan.
J Clin Lab Anal. 2025 Jul;39(14):e70064. doi: 10.1002/jcla.70064. Epub 2025 Jun 12.
From a preventive medicine perspective, this study aims to clarify the role of screening data in aging and health problems by estimating age from screening data and verifying the number of data items required in widely used screening tests.
A random forest model was applied to 11554 men and women (3043 and 8511, respectively) aged 0-95 years who underwent screening tests (60 blood tests, 8 urine tests and 2 saliva tests) between February 2020 and August 2023. All analyses were conducted in Python 3.10.12.
Using all 71 items including gender, a high accuracy of R = 0.7010 was achieved with 9243 training datasets (80% of total). R decreased slightly to 0.6937 when data items were reduced to 15 by removing less important variables. When datasets numbered fewer than 800 or data items fewer than 7, R fell below 0.6. Notably, postmenopausal women tended to have higher estimated ages compared to premenopausal women.
Age estimation from blood data using the random forest model (blood age) is sufficiently precise for assessing physical aging state. Blood age, as well as other biological ages estimated from various omics estimators, was shown to be a very promising method for exploring the problems of aging such as metabolic syndrome and frail syndrome.
从预防医学的角度来看,本研究旨在通过从筛查数据中估计年龄并验证广泛使用的筛查测试所需的数据项数量,来阐明筛查数据在衰老和健康问题中的作用。
将随机森林模型应用于2020年2月至2023年8月期间接受筛查测试(60项血液测试、8项尿液测试和2项唾液测试)的11554名0至95岁的男性和女性(分别为3043名和8511名)。所有分析均在Python 3.10.12中进行。
使用包括性别在内的所有71项数据,9243个训练数据集(占总数的80%)实现了较高的准确率,R = 0.7010。通过去除不太重要的变量将数据项减少到15项时,R略有下降至0.6937。当数据集数量少于800或数据项少于7项时,R降至0.6以下。值得注意的是,绝经后女性的估计年龄往往比绝经前女性更高。
使用随机森林模型从血液数据估计年龄(血液年龄)对于评估身体衰老状态足够精确。血液年龄以及从各种组学估计器估计的其他生物学年龄,被证明是探索诸如代谢综合征和衰弱综合征等衰老问题的非常有前景的方法。