Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany.
Genome Biol. 2021 Mar 5;22(1):75. doi: 10.1186/s13059-021-02294-2.
Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer .
控制下一代测序(NGS)数据文件的质量是一项必要但复杂的任务。为了解决这个问题,我们对常见的 NGS 质量特征进行了统计描述,并开发了一种新的质量控制程序,涉及基于树的和深度学习分类算法。在内部和外部功能基因组学数据集上进行验证的预测模型在一定程度上可以推广到来自未知物种的数据。得出的统计指南和预测模型为 NGS 数据的用户提供了有价值的资源,以更好地理解质量问题并执行自动质量控制。我们的指南和软件可在 https://github.com/salbrec/seqQscorer 上获得。