School of Business and Economics, Humboldt University of Berlin, Unter den Linden 6, 10099, Berlin, Germany.
Faculty of Biology, Zaporizhzhia National University, Zhukovskogo st., 10, Zaporizhzhia, 69600, Ukraine.
Biogerontology. 2020 Dec;21(6):731-744. doi: 10.1007/s10522-020-09890-y. Epub 2020 Jul 6.
In this paper, I build deep neural networks of various structures and hyperparameters in order to predict human chronological age based on open-access biochemical indicators and their specifications from the NHANES database. In total, 1152 neural networks are trained and tested. The algorithms are trained and tested on incomplete data: missing values in data records are extrapolated by mean or median values for each parameter. I select the best neural networks in terms of validation accuracy (coefficient of determination and mean absolute error). It turns out that the most accurate results are delivered by multilayer networks (6 layers) with recurrent layers. Neural network types are selected by trial and error. The algorithms reached an accuracy of 78% in terms of coefficient of determination and 6.5 in terms of mean absolute error. I also list empirically determined features of neural networks that increase accuracy for the task of chronological age prediction. Obtained results can be considered as an approximation of human biological age. Parameters in training datasets are selected the most broadly: all potentially relevant parameters (926) from the NHANES database are used. Although the networks are trained on the incomplete data, they demonstrated the ability to make reasonable predictions (with R > 0.7) based on no more than 100 biochemical indicators. Hence, for practical reasons the full data on each of 926 indicators are not required, although the analysis of the impact of each indicator is useful for theoretical developments.
在本文中,我构建了各种结构和超参数的深度神经网络,以便根据 NHANES 数据库中的公开生化指标及其规范来预测人类的实际年龄。总共训练和测试了 1152 个神经网络。这些算法是在不完整的数据上进行训练和测试的:对于每个参数的数据记录中的缺失值,通过平均值或中位数进行外推。我根据验证准确性(决定系数和平均绝对误差)选择最佳的神经网络。事实证明,具有递归层的多层网络(6 层)提供了最准确的结果。神经网络类型是通过反复试验选择的。在决定系数方面,算法的准确率达到了 78%,在平均绝对误差方面达到了 6.5。我还列出了经验确定的神经网络特征,这些特征可以提高预测实际年龄的任务的准确性。得到的结果可以被认为是对人类生物年龄的近似。训练数据集的参数选择非常广泛:使用了 NHANES 数据库中所有潜在相关的参数(926 个)。尽管网络是在不完整的数据上进行训练的,但它们展示了根据不超过 100 个生化指标进行合理预测的能力(R > 0.7)。因此,出于实际原因,并非每个 926 个指标的完整数据都需要,尽管每个指标的影响分析对于理论发展是有用的。