Iddamalgoda Lahiru, Das Partha S, Aponso Achala, Sundararajan Vijayaraghava S, Suravajhala Prashanth, Valadi Jayaraman K
Department of Computing, Informatics Institute of Technology, University of Westminster Colombo, Sri Lanka.
Department of Microbiology, Bioinformatics Infrastructure Facility, Vidyasagar UniversityMidnapore, India; Bioinformatics, Bioclues OrganizationHyderabad, India.
Front Genet. 2016 Aug 10;7:136. doi: 10.3389/fgene.2016.00136. eCollection 2016.
Data mining and pattern recognition methods reveal interesting findings in genetic studies, especially on how the genetic makeup is associated with inherited diseases. Although researchers have proposed various data mining models for biomedical approaches, there remains a challenge in accurately prioritizing the single nucleotide polymorphisms (SNP) associated with the disease. In this commentary, we review the state-of-art data mining and pattern recognition models for identifying inherited diseases and deliberate the need of binary classification- and scoring-based prioritization methods in determining causal variants. While we discuss the pros and cons associated with these methods known, we argue that the gene prioritization methods and the protein interaction (PPI) methods in conjunction with the K nearest neighbors' could be used in accurately categorizing the genetic factors in disease causation.
数据挖掘和模式识别方法在基因研究中揭示了有趣的发现,特别是关于基因组成如何与遗传疾病相关联。尽管研究人员已经为生物医学方法提出了各种数据挖掘模型,但在准确确定与疾病相关的单核苷酸多态性(SNP)的优先级方面仍然存在挑战。在这篇评论中,我们回顾了用于识别遗传疾病的最新数据挖掘和模式识别模型,并探讨了基于二元分类和评分的优先级方法在确定因果变异方面的必要性。在讨论这些已知方法的优缺点时,我们认为基因优先级方法和蛋白质相互作用(PPI)方法与K最近邻算法相结合,可用于准确分类疾病因果关系中的遗传因素。