Suppr超能文献

机器学习模型在北方汉族、南方汉族、韩国和日本之间遗传划分的优化方案。

Optimization scheme of machine learning model for genetic division between northern Han, southern Han, Korean and Japanese.

机构信息

Key Laboratory of Tianjin for Epigenetics, Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China.

Key Laboratory of Phylogeny and Comparative Genomics of Jiangsu Province, Jiangsu Normal University, Xuzhou 221116, China.

出版信息

Yi Chuan. 2022 Nov 20;44(11):1028-1043. doi: 10.16288/j.yczz.22-073.

Abstract

Han Chinese, Korean and Japanese are the main populations of East Asia, and Han Chinese presents a gradient admixture from north to south. There are differences among the East Asian populations in genetic structure. To achieve fine-scale genetic classification of southern (S-) and northern (N-) Han Chinese, Korean and Japanese individuals in this study, we collected and analyzed 1185 ancestry informative SNPs (AISNPs) from previous literature reports and our laboratory findings. First, two machine learning algorithms, softmax and randomForest, were used to build genetic classification models. Then, phylogenetic tree, STRUCTURE and principal component analysis were used to evaluate the performance of classification for different AISNP panels. The 234-AISNP panel achieved a fine-scale differentiation among the target populations in four classification schemes. The accuracy of the softmax model was 92%, which realized the accurate classification of the S-Han, N-Han, Korean and Japanese individuals. The two machine learning models tested in this study provided important references for the high-resolution discrimination of close-range populations and will be useful tools to optimize marker panels for developing forensic DNA ancestry inference systems.

摘要

汉族、朝鲜族和日本人群体是东亚的主要人群,其中汉族人群从北到南呈现出逐渐混合的趋势。东亚人群在遗传结构上存在差异。为了对南方(S-)和北方(N-)汉族、朝鲜族和日本个体进行精细的遗传分类,我们从先前的文献报告和实验室研究中收集和分析了 1185 个祖先信息 SNP(AISNP)。首先,我们使用两种机器学习算法(softmax 和 randomForest)构建遗传分类模型。然后,我们使用系统发育树、STRUCTURE 和主成分分析来评估不同 AISNP 面板的分类性能。在四种分类方案中,234-AISNP 面板实现了对目标人群的精细分化。softmax 模型的准确率为 92%,实现了对 S-汉族、N-汉族、朝鲜族和日本个体的准确分类。本研究中测试的两种机器学习模型为近距离人群的高分辨率区分提供了重要参考,并将成为优化标记面板以开发法医 DNA 祖先推断系统的有用工具。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验