Suppr超能文献

BCFtools/liftover:一种准确全面的工具,可跨基因组组装转换遗传变异。

BCFtools/liftover: an accurate and comprehensive tool to convert genetic variants across genome assemblies.

机构信息

Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States.

Stanley Center, Broad Institute of MIT and Harvard, Cambridge, MA 02142, United States.

出版信息

Bioinformatics. 2024 Jan 2;40(2). doi: 10.1093/bioinformatics/btae038.

Abstract

MOTIVATION

Many genetics studies report results tied to genomic coordinates of a legacy genome assembly. However, as assemblies are updated and improved, researchers are faced with either realigning raw sequence data using the updated coordinate system or converting legacy datasets to the updated coordinate system to be able to combine results with newer datasets. Currently available tools to perform the conversion of genetic variants have numerous shortcomings, including poor support for indels and multi-allelic variants, that lead to a higher rate of variants being dropped or incorrectly converted. As a result, many researchers continue to work with and publish using legacy genomic coordinates.

RESULTS

Here we present BCFtools/liftover, a tool to convert genomic coordinates across genome assemblies for variants encoded in the variant call format with improved support for indels represented by different reference alleles across genome assemblies and full support for multi-allelic variants. It further supports variant annotation fields updates whenever the reference allele changes across genome assemblies. The tool has the lowest rate of variants being dropped with an order of magnitude less indels dropped or incorrectly converted and is an order of magnitude faster than other tools typically used for the same task. It is particularly suited for converting variant callsets from large cohorts to novel telomere-to-telomere assemblies as well as summary statistics from genome-wide association studies tied to legacy genome assemblies.

AVAILABILITY AND IMPLEMENTATION

The tool is written in C and freely available under the MIT open source license as a BCFtools plugin available at http://github.com/freeseek/score.

摘要

动机

许多遗传学研究报告的结果都与传统基因组组装的基因组坐标有关。然而,随着组装的更新和改进,研究人员要么使用更新的坐标系统重新对齐原始序列数据,要么将传统数据集转换为更新的坐标系统,以便能够将结果与较新的数据集结合起来。目前可用的转换遗传变异的工具存在许多缺点,包括对插入和多等位基因变异的支持不足,这导致更多的变异被丢弃或错误转换。因此,许多研究人员继续使用传统的基因组坐标进行工作和发表研究成果。

结果

在这里,我们提出了 BCFtools/liftover,这是一种用于在基因组组装之间转换变体坐标的工具,它对不同基因组组装中的不同参考等位基因表示的插入缺失具有更好的支持,并且完全支持多等位基因变体。它进一步支持在基因组组装之间参考等位基因发生变化时更新变体注释字段。该工具具有最低的变异丢弃率,插入缺失丢弃或错误转换的数量级要少得多,并且比通常用于同一任务的其他工具快一个数量级。它特别适合将大型队列的变体调用集转换为新的端粒到端粒组装,以及与传统基因组组装相关的全基因组关联研究的汇总统计信息。

可用性和实现

该工具是用 C 编写的,根据麻省理工学院的开源许可证免费提供,作为一个 BCFtools 插件,可在 http://github.com/freeseek/score 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d6d/10832354/4f58a142a7ac/btae038f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验