Wei Zhipeng, Ding Shiying, Duan Meiyu, Liu Shuai, Huang Lan, Zhou Fengfeng
Health Informatics Lab, College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, Jilin, 130012, China.
Health Informatics Lab, College of Software, and Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun, Jilin, 130012, China.
Comput Biol Med. 2020 Oct;125:104008. doi: 10.1016/j.compbiomed.2020.104008. Epub 2020 Sep 26.
Accurate determination of the sample's chronological age is an important forensic problem. This regression problem may be improved by selecting appropriate methylomic features. Most of the existing feature selection algorithms, however, optimize the regression performance by considering only the original features. This study proposed four feature engineering strategies to transform the original methylomic features. The regression performance of the age regression model was improved by the resampling-based feature selection algorithm FeSTwo proposed in this study. FeSTwo outperformed the parallel algorithms used in the previous studies even with the electronic health record data. The age prediction performance of the FeSTwo-detected features was also confirmed for another independent dataset. The study results demonstrated that the proposed model, FeSTwo, led to a more than 8% reduction in root-mean-square error (RMSE) on the test dataset with only 70 features.
准确测定样本的年代年龄是一个重要的法医学问题。通过选择合适的甲基化组特征,这个回归问题可能会得到改善。然而,大多数现有的特征选择算法仅通过考虑原始特征来优化回归性能。本研究提出了四种特征工程策略来转换原始甲基化组特征。本研究提出的基于重采样的特征选择算法FeSTwo提高了年龄回归模型的回归性能。即使使用电子健康记录数据,FeSTwo也优于先前研究中使用的并行算法。对于另一个独立数据集,也证实了FeSTwo检测到的特征的年龄预测性能。研究结果表明,所提出的模型FeSTwo在仅使用70个特征的测试数据集上使均方根误差(RMSE)降低了8%以上。