Suppr超能文献

使用同义词库注释对匹配样本中的基因变异进行比较。

Comparison of genetic variants in matched samples using thesaurus annotation.

作者信息

Konopka Tomasz, Nijman Sebastian M B

机构信息

Ludwig Institute for Cancer Research, University of Oxford, Oxford, UK.

出版信息

Bioinformatics. 2016 Mar 1;32(5):657-63. doi: 10.1093/bioinformatics/btv654. Epub 2015 Nov 5.

Abstract

MOTIVATION

Calling changes in DNA, e.g. as a result of somatic events in cancer, requires analysis of multiple matched sequenced samples. Events in low-mappability regions of the human genome are difficult to encode in variant call files and have been under-reported as a result. However, they can be described accurately through thesaurus annotation-a technique that links multiple genomic loci together to explicate a single variant.

RESULTS

We here describe software and benchmarks for using thesaurus annotation to detect point changes in DNA from matched samples. In benchmarks on matched normal/tumor samples we show that the technique can recover between five and ten percent more true events than conventional approaches, while strictly limiting false discovery and being fully consistent with popular variant analysis workflows. We also demonstrate the utility of the approach for analysis of de novo mutations in parents/child families.

AVAILABILITY AND IMPLEMENTATION

Software performing thesaurus annotation is implemented in java; available in source code on github at GeneticThesaurus (https://github.com/tkonopka/GeneticThesaurus) and as an executable on sourceforge at geneticthesaurus (https://sourceforge.net/projects/geneticthesaurus). Mutation calling is implemented in an R package available on github at RGeneticThesaurus (https://github.com/tkonopka/RGeneticThesaurus).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

CONTACT

tomasz.konopka@ludwig.ox.ac.uk.

摘要

动机

识别DNA中的变化,例如癌症中的体细胞事件所导致的变化,需要对多个匹配的测序样本进行分析。人类基因组中低映射性区域的事件难以在变异调用文件中进行编码,因此报告不足。然而,它们可以通过词库注释准确描述——这是一种将多个基因组位点联系在一起以阐明单个变异的技术。

结果

我们在此描述了使用词库注释从匹配样本中检测DNA点变化的软件和基准。在匹配的正常/肿瘤样本基准测试中,我们表明该技术比传统方法能够多发现5%到10%的真实事件,同时严格限制错误发现,并且与流行的变异分析工作流程完全一致。我们还展示了该方法在分析亲子家庭中的新生突变方面的效用。

可用性和实现方式

执行词库注释的软件用Java实现;可在GitHub上的GeneticThesaurus(https://github.com/tkonopka/GeneticThesaurus)获取源代码,也可在SourceForge上的geneticthesaurus(https://sourceforge.net/projects/geneticthesaurus)获取可执行文件。变异调用在GitHub上的RGeneticThesaurus(https://github.com/tkonopka/RGeneticThesaurus)的R包中实现。

补充信息

补充数据可在《生物信息学》在线获取。

联系方式

tomasz.konopka@ludwig.ox.ac.uk

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3aba/4795618/29e1a9baa808/btv654f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验