Yosifov Deyan Yordanov, Schneider Christof, Stilgenbauer Stephan, Mertens Daniel, Tausch Eugen
Division of CLL, Department of Internal Medicine III, Ulm University Hospital, Ulm, Germany.
Cooperation Unit "Mechanisms of Leukemogenesis", German Cancer Research Center (DKFZ), Heidelberg, Germany.
BMC Res Notes. 2025 Jul 2;18(1):270. doi: 10.1186/s13104-025-07348-3.
Mislabelling and swapping of laboratory samples are handling errors that can lead to erroneous interpretation of data and/or patient harm. Sequenced samples can be traced back to the respective donors by matching of single nucleotide polymorphisms (SNPs). Frameworks and software to do this have been developed for use with whole genome/exome sequencing data but not for targeted next-generation sequencing (tNGS), possibly due to the limited genomic coverage with tNGS and the need for individualization of the set of interrogated SNPs. We decided to adapt a popular tool for use with tNGS data, to demonstrate the possibility of selecting informative SNPs from a typical tNGS panel and to create an automated workflow for detection of sample handling errors.
We compiled a custom list of 28 SNPs and with its help we demonstrated the practicability of using only tNGS data to cost-effectively detect mislabelled samples. In two cohorts of totally 1441 patients with sequential samples, we could identify 3 sample swaps, 7 mislabelled samples (3 externally and 4 internally) and 1 mistake of unknown origin. We provide an R function for automated detection of sample swaps and mislabelling to the community as a free and open-source tool.
实验室样本标记错误和样本交换是操作失误,可能导致数据解读错误和/或对患者造成伤害。通过单核苷酸多态性(SNP)匹配,测序样本可追溯至各自的捐赠者。用于全基因组/外显子组测序数据的相关框架和软件已开发出来,但针对靶向新一代测序(tNGS)的尚未开发,这可能是由于tNGS的基因组覆盖范围有限,以及需要对所检测的SNP集进行个体化处理。我们决定改编一种常用工具以用于tNGS数据,证明从典型的tNGS面板中选择信息性SNP的可能性,并创建一个用于检测样本处理错误的自动化工作流程。
我们编制了一份包含28个SNP的自定义列表,并借助该列表证明了仅使用tNGS数据经济高效地检测标记错误样本的可行性。在两组共1,441例有连续样本的患者中,我们能够识别出3次样本交换、7个标记错误的样本(3个外部样本和4个内部样本)以及1个来源不明的错误。我们向社区提供了一个用于自动检测样本交换和标记错误的R函数,作为免费的开源工具。