利用Y染色体短串联重复序列开发用于大规模比较基因分型和亲属关系分析的聚类应用程序

Towards Development of Clustering Applications for Large-Scale Comparative Genotyping and Kinship Analysis Using Y-Short Tandem Repeats.

作者信息

Seman Ali, Sapawi Azizian Mohd, Salleh Mohd Zaki

机构信息

1 Integrative Pharmacogenomics Institute (iPROMISE), Faculty of Computer and Mathematical Sciences Universiti Teknologi MARA (UiTM) , Shah Alam, Selangor, Malaysia .

2 Center for Computer Science Studies, Faculty of Computer and Mathematical Sciences Universiti Teknologi MARA (UiTM) , Shah Alam, Selangor, Malaysia .

出版信息

OMICS. 2015 Jun;19(6):361-7. doi: 10.1089/omi.2014.0136. Epub 2015 May 6.

DOI:10.1089/omi.2014.0136

PMID:25945508

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4486443/

Abstract

Y-chromosome short tandem repeats (Y-STRs) are genetic markers with practical applications in human identification. However, where mass identification is required (e.g., in the aftermath of disasters with significant fatalities), the efficiency of the process could be improved with new statistical approaches. Clustering applications are relatively new tools for large-scale comparative genotyping, and the k-Approximate Modal Haplotype (k-AMH), an efficient algorithm for clustering large-scale Y-STR data, represents a promising method for developing these tools. In this study we improved the k-AMH and produced three new algorithms: the Nk-AMH I (including a new initial cluster center selection), the Nk-AMH II (including a new dominant weighting value), and the Nk-AMH III (combining I and II). The Nk-AMH III was the superior algorithm, with mean clustering accuracy that increased in four out of six datasets and remained at 100% in the other two. Additionally, the Nk-AMH III achieved a 2% higher overall mean clustering accuracy score than the k-AMH, as well as optimal accuracy for all datasets (0.84-1.00). With inclusion of the two new methods, the Nk-AMH III produced an optimal solution for clustering Y-STR data; thus, the algorithm has potential for further development towards fully automatic clustering of any large-scale genotypic data.

摘要

Y染色体短串联重复序列（Y-STRs）是在人类身份识别中具有实际应用价值的遗传标记。然而，在需要进行大规模身份识别的情况下（例如，在重大灾难导致大量人员死亡之后），采用新的统计方法可以提高识别过程的效率。聚类应用是大规模比较基因分型的相对较新的工具，而k-近似模态单倍型（k-AMH）作为一种用于聚类大规模Y-STR数据的高效算法，是开发这些工具的一种有前景的方法。在本研究中，我们改进了k-AMH并产生了三种新算法：Nk-AMH I（包括一种新的初始聚类中心选择方法）、Nk-AMH II（包括一种新的主导加权值）和Nk-AMH III（结合了I和II）。Nk-AMH III是性能更优的算法，在六个数据集中有四个数据集的平均聚类准确率有所提高，另外两个数据集的平均聚类准确率保持在100%。此外，Nk-AMH III的总体平均聚类准确率得分比k-AMH高2%，并且在所有数据集中都达到了最优准确率（0.84 - 1.00）。通过纳入这两种新方法，Nk-AMH III为聚类Y-STR数据提供了一个最优解决方案；因此，该算法有潜力进一步发展为对任何大规模基因型数据进行全自动聚类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e8e/4486443/7db7d79e396e/fig-1.jpg

相似文献

Towards Development of Clustering Applications for Large-Scale Comparative Genotyping and Kinship Analysis Using Y-Short Tandem Repeats.利用Y染色体短串联重复序列开发用于大规模比较基因分型和亲属关系分析的聚类应用程序

OMICS. 2015 Jun;19(6):361-7. doi: 10.1089/omi.2014.0136. Epub 2015 May 6.

An efficient clustering algorithm for partitioning Y-short tandem repeats data.一种用于划分Y染色体短串联重复序列数据的高效聚类算法。

BMC Res Notes. 2012 Oct 6;5:557. doi: 10.1186/1756-0500-5-557.

Bioinformatics and human identification in mass fatality incidents: the world trade center disaster.大规模死亡事件中的生物信息学与人员身份鉴定：世贸中心灾难

J Forensic Sci. 2007 Jul;52(4):806-19. doi: 10.1111/j.1556-4029.2007.00456.x. Epub 2007 May 25.

Y-chromosome short tandem repeats analysis to complement paternal lineage study: a single institutional experience in Taiwan.Y染色体短串联重复序列分析以补充父系谱系研究：台湾某单一机构的经验

Transfusion. 2007 May;47(5):918-26. doi: 10.1111/j.1537-2995.2007.01210.x.

A new future of forensic Y-chromosome analysis: rapidly mutating Y-STRs for differentiating male relatives and paternal lineages.法医 Y 染色体分析的新未来：快速突变的 Y-STR 用于区分男性亲属和父系血统。

Forensic Sci Int Genet. 2012 Mar;6(2):208-18. doi: 10.1016/j.fsigen.2011.04.017. Epub 2011 May 25.

A substantially lower frequency of uninformative matches between 23 versus 17 Y-STR haplotypes in north Western Europe.在西北欧地区，23个与17个Y染色体短串联重复序列（Y-STR）单倍型之间无信息匹配的频率显著更低。

Forensic Sci Int Genet. 2014 Jul;11:214-9. doi: 10.1016/j.fsigen.2014.04.002. Epub 2014 Apr 13.

Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method.

Forensic Sci Int Genet. 2014 Jul;11:182-94. doi: 10.1016/j.fsigen.2014.03.016. Epub 2014 Apr 13.

An Application of ITO Analysis in Secondary Kinship Identification.ITO 分析在二级亲属识别中的应用

Comput Math Methods Med. 2022 Jul 1;2022:4381979. doi: 10.1155/2022/4381979. eCollection 2022.

"New turns from old STaRs": enhancing the capabilities of forensic short tandem repeat analysis.“旧明星的新转折”：增强法医短串联重复序列分析能力

Electrophoresis. 2014 Nov;35(21-22):3173-87. doi: 10.1002/elps.201400095. Epub 2014 Jul 16.

US forensic Y-chromosome short tandem repeats database.美国法医Y染色体短串联重复序列数据库。

Leg Med (Tokyo). 2010 Nov;12(6):289-95. doi: 10.1016/j.legalmed.2010.07.006.

引用本文的文献

Y-STR analysis of highly degraded DNA from skeletal remains over 70 years old.对70多年前骨骼残骸中高度降解的DNA进行Y染色体短串联重复序列分析。

Forensic Sci Res. 2024 Apr 12;9(2):owae020. doi: 10.1093/fsr/owae020. eCollection 2024 Jun.

The Qatari population's genetic structure and gene flow as revealed by the Y chromosome.《Y 染色体揭示的卡塔尔人口的遗传结构和基因流动》

PLoS One. 2023 Sep 1;18(9):e0290844. doi: 10.1371/journal.pone.0290844. eCollection 2023.

本文引用的文献

An efficient clustering algorithm for partitioning Y-short tandem repeats data.一种用于划分Y染色体短串联重复序列数据的高效聚类算法。

BMC Res Notes. 2012 Oct 6;5:557. doi: 10.1186/1756-0500-5-557.

Traces of a distant past.遥远过去的痕迹。

Sci Am. 2008 Jul;299(1):56-63. doi: 10.1038/scientificamerican0708-56.

Machine-learning approaches for classifying haplogroup from Y chromosome STR data.从Y染色体短串联重复序列（STR）数据分类单倍群的机器学习方法。

PLoS Comput Biol. 2008 Jun 13;4(6):e1000093. doi: 10.1371/journal.pcbi.1000093.

A comprehensive survey of human Y-chromosomal microsatellites.人类Y染色体微卫星的全面调查。

Am J Hum Genet. 2004 Jun;74(6):1183-97. doi: 10.1086/421531.

First Polish DNA "manhunt"--an application of Y-chromosome STRs.波兰首次DNA“追捕”——Y染色体短串联重复序列的应用

Int J Legal Med. 2002 Oct;116(5):289-91. doi: 10.1007/s00414-002-0320-0. Epub 2002 Jun 22.

Paternity testing using Y-STR haplotypes: assigning a probability for paternity in cases of mutations.使用Y染色体短串联重复序列（Y-STR）单倍型进行亲子鉴定：在存在突变的情况下确定父权概率

Int J Legal Med. 2001 Aug;115(1):12-5. doi: 10.1007/s004140000201.

DYS STR analysis with epithelial cells in a rape case.一起强奸案中上皮细胞的短串联重复序列分析。

Forensic Sci Int. 2001 May 15;118(2-3):126-30. doi: 10.1016/s0379-0738(00)00482-5.

Reconstruction of a historical genealogy by means of STR analysis and Y-haplotyping of ancient DNA.通过STR分析和古代DNA的Y单倍型分型重建历史谱系。

Eur J Hum Genet. 1999 May-Jun;7(4):469-77. doi: 10.1038/sj.ejhg.5200322.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用Y染色体短串联重复序列开发用于大规模比较基因分型和亲属关系分析的聚类应用程序

Towards Development of Clustering Applications for Large-Scale Comparative Genotyping and Kinship Analysis Using Y-Short Tandem Repeats.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献