Suppr超能文献

文库分析器:一种用于分析测序文库组成的质量控制工具。

Librarian: A quality control tool to analyse sequencing library compositions.

作者信息

Vashishtha Kartavya, Gaud Caroline, Andrews Simon, Krueger Christel

机构信息

Independent Researcher, New Delhi, India.

Bioinformatics, Babraham Institute, Cambridge, CB22 3AT, UK.

出版信息

F1000Res. 2022 Sep 29;11:1122. doi: 10.12688/f1000research.125325.2. eCollection 2022.

Abstract

BACKGROUND

Robust analysis of DNA sequencing data needs to include a set of quality control steps to ensure that technical bias is kept to a minimum. A metric easily obtained is the frequency of each of the nucleobases for each position across all sequencing reads. Here, we explore the differences in nucleobase compositions of various library types produced by standard experimental methodologies.

METHODS

We obtained the compositions of nearly 3000 publicly available datasets and subjected them to Uniform Manifold Approximation and Projection (UMAP) dimensionality reduction for a two-dimensional representation of their composition characteristics.

RESULTS

We find that most library types result in a specific composition profile. We use this to give an estimate of how strongly the composition of a test library resembles the profiles of previously published libraries, and how likely the test sample is to be of a particular type. We introduce Librarian, a user-friendly web application and command line tool which enables checking base compositions of test libraries against known library types.

CONCLUSIONS

Library preparation methods strongly influence the per position nucleobase content. By comparing test libraries to a database of previously published library types we can make predictions regarding the library preparation method. Librarian is a user-friendly tool to access this information for quality assurance purposes as discrepancies can flag potential irregularities very early on.

摘要

背景

对DNA测序数据进行稳健分析需要包含一系列质量控制步骤,以确保将技术偏差降至最低。一个易于获得的指标是所有测序读数中每个位置的每个核碱基的频率。在此,我们探讨了标准实验方法产生的各种文库类型的核碱基组成差异。

方法

我们获取了近3000个公开可用数据集的组成,并对其进行均匀流形逼近和投影(UMAP)降维,以二维方式呈现其组成特征。

结果

我们发现大多数文库类型会产生特定的组成概况。我们利用这一点来估计测试文库的组成与先前发表文库的概况相似程度,以及测试样本属于特定类型的可能性。我们推出了Librarian,这是一个用户友好的网络应用程序和命令行工具,可用于对照已知文库类型检查测试文库的碱基组成。

结论

文库制备方法对每个位置的核碱基含量有很大影响。通过将测试文库与先前发表文库类型的数据库进行比较,我们可以对文库制备方法进行预测。Librarian是一个用户友好的工具,可用于获取此信息以进行质量保证,因为差异可在早期标记潜在的异常情况。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc3e/10808850/5a7abea2a964/f1000research-11-161650-g0000.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验