Division of Genetics, Brigham and Women's Hospital, Boston, Massachusetts, USA.
Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA.
Genet Med. 2018 Sep;20(9):936-941. doi: 10.1038/gim.2017.230. Epub 2018 Feb 1.
Over 150,000 variants have been reported to cause Mendelian disease in the medical literature. It is still difficult to leverage this knowledge base in clinical practice, as many reports lack strong statistical evidence or may include false associations. Clinical laboratories assess whether these variants (along with newly observed variants that are adjacent to these published ones) underlie clinical disorders.
We investigated whether citation data-including journal impact factor and the number of cited variants (NCV) in each gene with published disease associations-can be used to improve variant assessment.
Surprisingly, we found that impact factor is not predictive of pathogenicity, but the NCV score for each gene can provide statistical support for prediction of pathogenicity. When this gene-level citation metric is combined with variant-level evolutionary conservation and structural features, classification accuracy reaches 89.5%. Further, variants identified in clinical exome sequencing cases have higher NCVs than do simulated rare variants from the Exome Aggregation Consortium database within the same set of genes and functional consequences (P < 2.22 × 10).
Aggregate citation data can complement existing variant-based predictive algorithms, and can boost their performance without the need to access and review large numbers of papers. The NCV is a slow-growing metric of scientific knowledge about each gene's association with disease.
在医学文献中,已有超过 150,000 个变体被报道可导致孟德尔疾病。但由于许多报告缺乏强有力的统计证据,或者可能包含错误的关联,因此在临床实践中仍然难以利用这一知识库。临床实验室评估这些变体(以及与新观察到的、位于这些已发表变体附近的变体)是否是临床疾病的基础。
我们研究了引用数据(包括期刊影响因子和每个具有发表疾病关联基因的引用变体数量(NCV))是否可用于改进变体评估。
令人惊讶的是,我们发现影响因子不能预测致病性,但每个基因的 NCV 评分可以为致病性预测提供统计支持。当将这种基于基因的引用指标与基于变体的进化保守性和结构特征相结合时,分类准确性达到 89.5%。此外,在同一组基因和功能后果中,临床外显子组测序病例中鉴定的变体的 NCV 高于来自同一组基因和功能后果的 Exome Aggregation Consortium 数据库中的模拟罕见变体(P < 2.22 × 10)。
综合引用数据可以补充现有的基于变体的预测算法,并在无需访问和审查大量论文的情况下提高其性能。NCV 是一个关于每个基因与疾病关联的科学知识的缓慢增长的指标。