de Barros Rodrigues Maria Luisa, Rodrigues Marcelo Porto, Norton Heather L, Mendes-Junior Celso Teixeira, Simões Aguinaldo Luiz, Lawson Daniel John
Programa de Pós-Graduação em Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Av. Bandeirantes 3900, Ribeirão Preto, SP 14049-900, Brazil.
Retired.
Forensic Sci Int Genet. 2025 Jan;74:103153. doi: 10.1016/j.fsigen.2024.103153. Epub 2024 Oct 5.
Microhaplotypes (MHs) describe physically close genetic markers that are inherited together and are gaining prominence due to their efficiency in forensic, clinical, and population studies. They excel in kinship analysis, DNA mixture detection, and ancestry inference, offering advantages in precision over individual SNPs and STRs. In this study, a pipeline was developed to efficiently select highly informative MHs from large-scale genomic datasets. Over 120,000 MHs were identified from almost a million markers, which allow this non-independent information to be efficiently used for inference. The MHs were compared to SNPs in terms of their informativeness and performance of their subsets in ancestry inference and all the results consistently favored MHs. A method for ranking markers by specific population informativeness was also introduced, which showed improvement in the accuracy of Native American ancestry estimation, overcoming the challenges of its underrepresentation in datasets. In conclusion, this study presents a comprehensive way for selecting highly informative MHs for accurate ancestry inference. The proposed approach and the subsets selected by specific population informativeness offer valuable tools for improving ancestry inference accuracy, particularly for admixed populations as demonstrated for a Brazilian dataset.
微单倍型(MHs)描述的是物理距离相近且共同遗传的基因标记,由于其在法医、临床和群体研究中的高效性,正日益受到关注。它们在亲缘关系分析、DNA混合检测和祖先推断方面表现出色,与单个单核苷酸多态性(SNPs)和短串联重复序列(STRs)相比,在精度上具有优势。在本研究中,开发了一种流程,用于从大规模基因组数据集中高效选择信息丰富的微单倍型。从近百万个标记中识别出了超过12万个微单倍型,这使得这些非独立信息能够有效地用于推断。在信息性及其子集在祖先推断中的表现方面,将微单倍型与单核苷酸多态性进行了比较,所有结果一致支持微单倍型。还引入了一种根据特定群体信息性对标记进行排名的方法,该方法提高了美洲原住民祖先估计的准确性,克服了数据集中其代表性不足的挑战。总之,本研究提出了一种全面的方法,用于选择信息丰富的微单倍型以进行准确的祖先推断。所提出的方法以及根据特定群体信息性选择的子集为提高祖先推断准确性提供了有价值的工具,特别是对于混合群体,如巴西数据集所示。