Suppr超能文献

利用Y染色体短串联重复序列开发用于大规模比较基因分型和亲属关系分析的聚类应用程序

Towards Development of Clustering Applications for Large-Scale Comparative Genotyping and Kinship Analysis Using Y-Short Tandem Repeats.

作者信息

Seman Ali, Sapawi Azizian Mohd, Salleh Mohd Zaki

机构信息

1 Integrative Pharmacogenomics Institute (iPROMISE), Faculty of Computer and Mathematical Sciences Universiti Teknologi MARA (UiTM) , Shah Alam, Selangor, Malaysia .

2 Center for Computer Science Studies, Faculty of Computer and Mathematical Sciences Universiti Teknologi MARA (UiTM) , Shah Alam, Selangor, Malaysia .

出版信息

OMICS. 2015 Jun;19(6):361-7. doi: 10.1089/omi.2014.0136. Epub 2015 May 6.

Abstract

Y-chromosome short tandem repeats (Y-STRs) are genetic markers with practical applications in human identification. However, where mass identification is required (e.g., in the aftermath of disasters with significant fatalities), the efficiency of the process could be improved with new statistical approaches. Clustering applications are relatively new tools for large-scale comparative genotyping, and the k-Approximate Modal Haplotype (k-AMH), an efficient algorithm for clustering large-scale Y-STR data, represents a promising method for developing these tools. In this study we improved the k-AMH and produced three new algorithms: the Nk-AMH I (including a new initial cluster center selection), the Nk-AMH II (including a new dominant weighting value), and the Nk-AMH III (combining I and II). The Nk-AMH III was the superior algorithm, with mean clustering accuracy that increased in four out of six datasets and remained at 100% in the other two. Additionally, the Nk-AMH III achieved a 2% higher overall mean clustering accuracy score than the k-AMH, as well as optimal accuracy for all datasets (0.84-1.00). With inclusion of the two new methods, the Nk-AMH III produced an optimal solution for clustering Y-STR data; thus, the algorithm has potential for further development towards fully automatic clustering of any large-scale genotypic data.

摘要

Y染色体短串联重复序列(Y-STRs)是在人类身份识别中具有实际应用价值的遗传标记。然而,在需要进行大规模身份识别的情况下(例如,在重大灾难导致大量人员死亡之后),采用新的统计方法可以提高识别过程的效率。聚类应用是大规模比较基因分型的相对较新的工具,而k-近似模态单倍型(k-AMH)作为一种用于聚类大规模Y-STR数据的高效算法,是开发这些工具的一种有前景的方法。在本研究中,我们改进了k-AMH并产生了三种新算法:Nk-AMH I(包括一种新的初始聚类中心选择方法)、Nk-AMH II(包括一种新的主导加权值)和Nk-AMH III(结合了I和II)。Nk-AMH III是性能更优的算法,在六个数据集中有四个数据集的平均聚类准确率有所提高,另外两个数据集的平均聚类准确率保持在100%。此外,Nk-AMH III的总体平均聚类准确率得分比k-AMH高2%,并且在所有数据集中都达到了最优准确率(0.84 - 1.00)。通过纳入这两种新方法,Nk-AMH III为聚类Y-STR数据提供了一个最优解决方案;因此,该算法有潜力进一步发展为对任何大规模基因型数据进行全自动聚类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e8e/4486443/7db7d79e396e/fig-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验