Jalali Sefid Dashti Mahjoubeh, Gamieldien Junaid
South African National Bioinformatics Institute, University of the Western Cape, Bellville, South Africa.
Biotechniques. 2017 Jan 1;62(1):18-30. doi: 10.2144/000114492.
Next-generation sequencing (NGS) of whole genomes and exomes is a powerful tool in biomedical research and clinical diagnostics. However, the vast amount of data produced by NGS introduces new challenges and opportunities, many of which require novel computational and theoretical approaches when it comes to identifying the causal variant(s) for a disease of interest. While workflows and associated software to process raw data and produce high-confidence variant calls have significantly improved, filtering tens of thousands of candidates to identify a subset relevant to a specific study is still a complex exercise best left to bioinformaticists. However, as this prioritization procedure requires biological/biomedical reasoning, biologists and clinicians are increasingly motivated to handle the task themselves. Here, we describe a set of guidelines, tools, and online resources that can be used to identify functional variants from whole-genome and whole-exome variant calls and then prioritize these variants with potential associations to phenotypes of interest. Insights gained from a recently published analysis of protein-coding gene variation in >60,000 humans by the Exome Aggregation Consortium (ExAC) are also taken into account.
全基因组和外显子组的新一代测序(NGS)是生物医学研究和临床诊断中的一项强大工具。然而,NGS产生的大量数据带来了新的挑战和机遇,在确定感兴趣疾病的致病变异时,其中许多挑战和机遇都需要新颖的计算和理论方法。虽然处理原始数据并生成高可信度变异位点的工作流程及相关软件已有显著改进,但从数以万计的候选变异中筛选出与特定研究相关的子集仍是一项复杂的工作,最好留给生物信息学家来做。然而,由于这种优先级排序程序需要生物学/生物医学推理,生物学家和临床医生越来越有动力自己处理这项任务。在此,我们描述了一套指南、工具和在线资源,可用于从全基因组和全外显子组变异位点中识别功能性变异,然后对这些与感兴趣表型具有潜在关联的变异进行优先级排序。我们还考虑了外显子聚合联盟(ExAC)最近发表的对6万多人蛋白质编码基因变异分析所获得的见解。