Suppr超能文献

基于集合学习算法的 mtDNA 单倍型分类。

Ensemble learning algorithms for classification of mtDNA into haplogroups.

机构信息

Department of Bioengineering, University of Pennsylvania, USA.

出版信息

Brief Bioinform. 2011 Jan;12(1):1-9. doi: 10.1093/bib/bbq008. Epub 2010 Mar 4.

Abstract

Classification of mitochondrial DNA (mtDNA) into their respective haplogroups allows the addressing of various anthropologic and forensic issues. Unique to mtDNA is its abundance and non-recombining uni-parental mode of inheritance; consequently, mutations are the only changes observed in the genetic material. These individual mutations are classified into their cladistic haplogroups allowing the tracing of different genetic branch points in human (and other organisms) evolution. Due to the large number of samples, it becomes necessary to automate the classification process. Using 5-fold cross-validation, we investigated two classification techniques on the consented database of 21 141 samples published by the Genographic project. The support vector machines (SVM) algorithm achieved a macro-accuracy of 88.06% and micro-accuracy of 96.59%, while the random forest (RF) algorithm achieved a macro-accuracy of 87.35% and micro-accuracy of 96.19%. In addition to being faster and more memory-economic in making predictions, SVM and RF are better than or comparable to the nearest-neighbor method employed by the Genographic project in terms of prediction accuracy.

摘要

将线粒体 DNA(mtDNA)分类为其各自的单倍群,可以解决各种人类学和法医学问题。mtDNA 独特之处在于其丰富性和非重组的单亲遗传模式;因此,突变是遗传物质中唯一观察到的变化。这些个体突变被分类为它们的系统发育单倍群,允许追踪人类(和其他生物体)进化中的不同遗传分支点。由于样本数量众多,因此需要自动化分类过程。使用 5 倍交叉验证,我们在 Genographic 项目发布的 21141 个样本的同意数据库上研究了两种分类技术。支持向量机(SVM)算法的宏观准确性为 88.06%,微观准确性为 96.59%,而随机森林(RF)算法的宏观准确性为 87.35%,微观准确性为 96.19%。SVM 和 RF 不仅在进行预测时速度更快、内存效率更高,而且在预测准确性方面也优于或可与 Genographic 项目使用的最近邻方法相媲美。

相似文献

10
A fast approximate nearest neighbor search algorithm in the Hamming space.汉明空间中的快速近似最近邻搜索算法。
IEEE Trans Pattern Anal Mach Intell. 2012 Dec;34(12):2481-8. doi: 10.1109/TPAMI.2012.170.

引用本文的文献

3
The Role and Mechanism of Metformin in Inflammatory Diseases.二甲双胍在炎症性疾病中的作用及机制
J Inflamm Res. 2023 Nov 23;16:5545-5564. doi: 10.2147/JIR.S436147. eCollection 2023.
6
The use of classification trees for bioinformatics.分类树在生物信息学中的应用。
Wiley Interdiscip Rev Data Min Knowl Discov. 2011 Jan;1(1):55-63. doi: 10.1002/widm.14. Epub 2011 Jan 6.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验