Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, Grenoble, France.
Molecular Oncology Laboratory, Children's Cancer Research Unit, Kids Research, The Children's Hospital at Westmead, Westmead, New South Wales, Australia.
PLoS One. 2019 Mar 1;14(3):e0213266. doi: 10.1371/journal.pone.0213266. eCollection 2019.
Nucleotide sequence reagents are verifiable experimental reagents in biomedical publications, because their sequence identities can be independently verified and compared with associated text descriptors. We have previously reported that incorrectly identified nucleotide sequence reagents are characteristic of highly similar human gene knockdown studies, some of which have been retracted from the literature on account of possible research fraud. Because of the throughput limitations of manual verification of nucleotide sequences, we developed a semi-automated fact checking tool, Seek & Blastn, to verify the targeting or non-targeting status of published nucleotide sequence reagents. From previously described and unknown corpora of 48 and 155 publications, respectively, Seek & Blastn correctly extracted 304/342 (88.9%) and 1066/1522 (70.0%) nucleotide sequences and a predicted targeting/ non-targeting status. Seek & Blastn correctly predicted the targeting/ non-targeting status of 293/304 (96.4%) and 988/1066 (92.7%) of the correctly extracted nucleotide sequences. A total of 38/39 (97.4%) or 31/79 (39.2%) Seek & Blastn predictions of incorrect nucleotide sequence reagent use were correct in the two literature corpora. Combined Seek & Blastn and manual analyses identified a list of 91 misidentified nucleotide sequence reagents, which could be built upon through future studies. In summary, incorrect nucleotide sequence reagents represent an under-recognized source of error within the biomedical literature, and fact checking tools such as Seek & Blastn may help to identify papers and manuscripts affected by these errors.
核苷酸序列试剂是生物医学出版物中可验证的实验试剂,因为它们的序列同一性可以独立验证,并与相关的文本描述符进行比较。我们之前曾报道,错误识别的核苷酸序列试剂是高度相似的人类基因敲低研究的特征,其中一些已因可能的研究欺诈而从文献中撤回。由于手动验证核苷酸序列的通量限制,我们开发了一种半自动事实检查工具 Seek & Blastn,以验证已发表的核苷酸序列试剂的靶向或非靶向状态。从之前描述的和未知的分别包含 48 篇和 155 篇出版物的语料库中,Seek & Blastn 正确提取了 304/342(88.9%)和 1066/1522(70.0%)的核苷酸序列及其预测的靶向/非靶向状态。Seek & Blastn 正确预测了 293/304(96.4%)和 988/1066(92.7%)正确提取的核苷酸序列的靶向/非靶向状态。在这两个文献语料库中,Seek & Blastn 对 38/39(97.4%)或 31/79(39.2%)不正确核苷酸序列试剂使用的预测中有 38/39(97.4%)或 31/79(39.2%)是正确的。Seek & Blastn 与手动分析相结合,确定了一份 91 种错误识别的核苷酸序列试剂清单,可在此基础上开展进一步研究。总之,不正确的核苷酸序列试剂是生物医学文献中一个未被充分认识的错误源,Seek & Blastn 等事实检查工具可能有助于识别受这些错误影响的论文和手稿。