Stelzer Gil, Plaschkes Inbar, Oz-Levi Danit, Alkelai Anna, Olender Tsviya, Zimmerman Shahar, Twik Michal, Belinky Frida, Fishilevich Simon, Nudel Ron, Guan-Golan Yaron, Warshawsky David, Dahary Dvir, Kohn Asher, Mazor Yaron, Kaplan Sergey, Iny Stein Tsippi, Baris Hagit N, Rappaport Noa, Safran Marilyn, Lancet Doron
Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel.
LifeMap Sciences Ltd, Tel Aviv, Israel.
BMC Genomics. 2016 Jun 23;17 Suppl 2(Suppl 2):444. doi: 10.1186/s12864-016-2722-2.
Next generation sequencing (NGS) provides a key technology for deciphering the genetic underpinnings of human diseases. Typical NGS analyses of a patient depict tens of thousands non-reference coding variants, but only one or very few are expected to be significant for the relevant disorder. In a filtering stage, one employs family segregation, rarity in the population, predicted protein impact and evolutionary conservation as a means for shortening the variation list. However, narrowing down further towards culprit disease genes usually entails laborious seeking of gene-phenotype relationships, consulting numerous separate databases. Thus, a major challenge is to transition from the few hundred shortlisted genes to the most viable disease-causing candidates.
We describe a novel tool, VarElect ( http://ve.genecards.org ), a comprehensive phenotype-dependent variant/gene prioritizer, based on the widely-used GeneCards, which helps rapidly identify causal mutations with extensive evidence. The GeneCards suite offers an effective and speedy alternative, whereby >120 gene-centric automatically-mined data sources are jointly available for the task. VarElect cashes on this wealth of information, as well as on GeneCards' powerful free-text Boolean search and scoring capabilities, proficiently matching variant-containing genes to submitted disease/symptom keywords. The tool also leverages the rich disease and pathway information of MalaCards, the human disease database, and PathCards, the unified pathway (SuperPaths) database, both within the GeneCards Suite. The VarElect algorithm infers direct as well as indirect links between genes and phenotypes, the latter benefitting from GeneCards' diverse gene-to-gene data links in GenesLikeMe. Finally, our tool offers an extensive gene-phenotype evidence portrayal ("MiniCards") and hyperlinks to the parent databases.
We demonstrate that VarElect compares favorably with several often-used NGS phenotyping tools, thus providing a robust facility for ranking genes, pointing out their likelihood to be related to a patient's disease. VarElect's capacity to automatically process numerous NGS cases, either in stand-alone format or in VCF-analyzer mode (TGex and VarAnnot), is indispensable for emerging clinical projects that involve thousands of whole exome/genome NGS analyses.
新一代测序(NGS)为解读人类疾病的遗传基础提供了关键技术。对患者进行典型的NGS分析会描绘出数万个非参考编码变异,但预计其中只有一个或极少数对相关疾病具有重要意义。在筛选阶段,人们利用家系分离、人群中的罕见性、预测的蛋白质影响和进化保守性来缩短变异列表。然而,进一步缩小范围至致病疾病基因通常需要费力地寻找基因与表型的关系,并查阅众多独立的数据库。因此,一个主要挑战是从几百个入围基因过渡到最有可能致病的候选基因。
我们证明VarElect与几种常用的NGS表型分析工具相比具有优势,从而为基因排名提供了一个强大的工具,指出它们与患者疾病相关的可能性。VarElect以独立格式或VCF分析模式(TGex和VarAnnot)自动处理大量NGS病例的能力,对于涉及数千次全外显子组/基因组NGS分析的新兴临床项目来说是不可或缺的。