Department of Biostatistics and Health Informatics, King's College London, London, UK.
Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK.
BMC Bioinformatics. 2019 Apr 27;20(1):213. doi: 10.1186/s12859-019-2791-8.
Next Generation Sequencing (NGS) is a commonly used technology for studying the genetic basis of biological processes and it underpins the aspirations of precision medicine. However, there are significant challenges when dealing with NGS data. Firstly, a huge number of bioinformatics tools for a wide range of uses exist, therefore it is challenging to design an analysis pipeline. Secondly, NGS analysis is computationally intensive, requiring expensive infrastructure, and many medical and research centres do not have adequate high performance computing facilities and cloud computing is not always an option due to privacy and ownership issues. Finally, the interpretation of the results is not trivial and most available pipelines lack the utilities to favour this crucial step.
We have therefore developed a fast and efficient bioinformatics pipeline that allows for the analysis of DNA sequencing data, while requiring little computational effort and memory usage. DNAscan can analyse a whole exome sequencing sample in 1 h and a 40x whole genome sequencing sample in 13 h, on a midrange computer. The pipeline can look for single nucleotide variants, small indels, structural variants, repeat expansions and viral genetic material (or any other organism). Its results are annotated using a customisable variety of databases and are available for an on-the-fly visualisation with a local deployment of the gene.iobio platform. DNAscan is implemented in Python. Its code and documentation are available on GitHub: https://github.com/KHP-Informatics/DNAscan . Instructions for an easy and fast deployment with Docker and Singularity are also provided on GitHub.
DNAscan is an extremely fast and computationally efficient pipeline for analysis, visualization and interpretation of NGS data. It is designed to provide a powerful and easy-to-use tool for applications in biomedical research and diagnostic medicine, at minimal computational cost. Its comprehensive approach will maximise the potential audience of users, bringing such analyses within the reach of non-specialist laboratories, and those from centres with limited funding available.
下一代测序(NGS)是一种常用于研究生物过程遗传基础的技术,它是精准医学的基础。然而,在处理 NGS 数据时,存在着重大的挑战。首先,存在大量针对广泛用途的生物信息学工具,因此设计分析管道具有挑战性。其次,NGS 分析计算密集度高,需要昂贵的基础设施,许多医疗和研究中心没有足够的高性能计算设施,并且由于隐私和所有权问题,云计算并不总是一种选择。最后,结果的解释并不简单,大多数可用的管道缺乏有利于这一关键步骤的实用程序。
因此,我们开发了一种快速有效的生物信息学管道,允许分析 DNA 测序数据,同时需要很少的计算工作量和内存使用。DNAscan 可以在中档计算机上在 1 小时内分析整个外显子组测序样本,在 13 小时内分析 40x 全基因组测序样本。该管道可以寻找单核苷酸变异、小插入缺失、结构变异、重复扩展和病毒遗传物质(或任何其他生物体)。其结果使用可定制的各种数据库进行注释,并可使用本地部署的 gene.iobio 平台进行实时可视化。DNAscan 是用 Python 实现的。其代码和文档可在 GitHub 上获得:https://github.com/KHP-Informatics/DNAscan。在 GitHub 上还提供了使用 Docker 和 Singularity 进行简单快速部署的说明。
DNAscan 是一种用于分析、可视化和解释 NGS 数据的极快且计算效率高的管道。它旨在为生物医学研究和诊断医学中的应用提供强大易用的工具,同时计算成本最低。它的综合方法将最大限度地扩大用户的潜在受众,使非专业实验室和资金有限的中心能够进行此类分析。