Suppr超能文献

ATAV:一个用于全人群基因组分析的综合平台。

ATAV: a comprehensive platform for population-scale genomic analyses.

机构信息

Institute for Genomic Medicine, Columbia University Irving Medical Center, New York, NY, 10032, USA.

出版信息

BMC Bioinformatics. 2021 Mar 23;22(1):149. doi: 10.1186/s12859-021-04071-1.

Abstract

BACKGROUND

A common approach for sequencing studies is to do joint-calling and store variants of all samples in a single file. If new samples are continually added or controls are re-used for several studies, the cost and time required to perform joint-calling for each analysis can become prohibitive.

RESULTS

We present ATAV, an analysis platform for large-scale whole-exome and whole-genome sequencing projects. ATAV stores variant and per site coverage data for all samples in a centralized database, which is efficiently queried by ATAV to support diagnostic analyses for trios and singletons, as well as rare-variant collapsing analyses for finding disease associations in complex diseases. Runtime logs ensure full reproducibility and the modularized ATAV framework makes it extensible to continuous development. Besides helping with the identification of disease-causing variants for a range of diseases, ATAV has also enabled the discovery of disease-genes by rare-variant collapsing on datasets containing more than 20,000 samples. Analyses to date have been performed on data of more than 110,000 individuals demonstrating the scalability of the framework. To allow users to easily access variant-level data directly from the database, we provide a web-based interface, the ATAV data browser ( http://atavdb.org/ ). Through this browser, summary-level data for more than 40,000 samples can be queried by the general public representing a mix of cases and controls of diverse ancestries. Users have access to phenotype categories of variant carriers, as well as predicted ancestry, gender, and quality metrics. In contrast to many other platforms, the data browser is able to show data of newly-added samples in real-time and therefore evolves rapidly as more and more samples are sequenced.

CONCLUSIONS

Through ATAV, users have public access to one of the largest variant databases for patients sequenced at a tertiary care center and can look up any genes or variants of interest. Additionally, since the entire code is freely available on GitHub, ATAV can easily be deployed by other groups that wish to build their own platform, database, and user interface.

摘要

背景

测序研究的常用方法是联合调用,并将所有样本的变体存储在单个文件中。如果不断添加新样本或控制样本用于多项研究,那么为每个分析执行联合调用的成本和时间可能会变得过高。

结果

我们介绍了 ATAV,这是一个用于大规模全外显子组和全基因组测序项目的分析平台。ATAV 将所有样本的变体和每个位点的覆盖数据存储在一个集中式数据库中,ATAV 可以通过该数据库高效地查询这些数据,以支持对三核苷酸重复和单体的诊断分析,以及对复杂疾病进行罕见变体合并分析以寻找疾病关联。运行时日志确保了完全的可重复性,而模块化的 ATAV 框架使其可以扩展到持续的开发中。除了帮助识别各种疾病的致病变体外,ATAV 还通过对包含超过 20,000 个样本的数据集进行罕见变体合并分析,发现了疾病基因。迄今为止,已经对超过 110,000 个人的数据进行了分析,证明了该框架的可扩展性。为了允许用户轻松地从数据库直接访问变体级别的数据,我们提供了一个基于网络的界面,即 ATAV 数据浏览器(http://atavdb.org/)。通过该浏览器,代表不同种族背景的病例和对照的 40,000 多个样本的汇总数据可以由公众查询。用户可以访问变异携带者的表型类别,以及预测的种族、性别和质量指标。与许多其他平台不同,该数据浏览器能够实时显示新添加样本的数据,因此随着越来越多的样本被测序,它会迅速发展。

结论

通过 ATAV,用户可以公开访问一个在三级护理中心测序的患者的最大变体数据库之一,并可以查询任何感兴趣的基因或变体。此外,由于整个代码都可以在 GitHub 上免费获得,因此其他希望构建自己的平台、数据库和用户界面的团体可以轻松地部署 ATAV。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d376/7988908/5667f4277806/12859_2021_4071_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验