单碱基分辨率下 miRNA reads 的畸变校正超微分析：一种 k-mer 格点方法。

Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.

机构信息

Data Science Institute, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia.

School of Biomedical Engineering, Faculty of Engineering and IT, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia.

出版信息

Nucleic Acids Res. 2021 Oct 11;49(18):e106. doi: 10.1093/nar/gkab610.

DOI:10.1093/nar/gkab610

PMID:34291293

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8631080/

Abstract

Raw sequencing reads of miRNAs contain machine-made substitution errors, or even insertions and deletions (indels). Although the error rate can be low at 0.1%, precise rectification of these errors is critically important because isoform variation analysis at single-base resolution such as novel isomiR discovery, editing events understanding, differential expression analysis, or tissue-specific isoform identification is very sensitive to base positions and copy counts of the reads. Existing error correction methods do not work for miRNA sequencing data attributed to miRNAs' length and per-read-coverage properties distinct from DNA or mRNA sequencing reads. We present a novel lattice structure combining kmers, (k - 1)mers and (k + 1)mers to address this problem. The method is particularly effective for the correction of indel errors. Extensive tests on datasets having known ground truth of errors demonstrate that the method is able to remove almost all of the errors, without introducing any new error, to improve the data quality from every-50-reads containing one error to every-1300-reads containing one error. Studies on experimental miRNA sequencing datasets show that the errors are often rectified at the 5' ends and the seed regions of the reads, and that there are remarkable changes after the correction in miRNA isoform abundance, volume of singleton reads, overall entropy, isomiR families, tissue-specific miRNAs, and rare-miRNA quantities.

摘要

miRNA 原始测序读段包含机器制造的替代错误，甚至插入和缺失（indels）。尽管错误率可能低至 0.1%，但这些错误的精确校正非常重要，因为单碱基分辨率的变体分析，如新型的 isomiR 发现、编辑事件理解、差异表达分析或组织特异性异构体鉴定，对读段的碱基位置和拷贝数非常敏感。现有的错误校正方法不适用于 miRNA 测序数据，这归因于 miRNA 的长度和每个读段的覆盖特性与 DNA 或 mRNA 测序读段不同。我们提出了一种新的格结构，结合了 kmers、(k - 1)mers 和 (k + 1)mers 来解决这个问题。该方法特别有效地校正插入缺失错误。在具有已知错误真实情况的数据集上进行的广泛测试表明，该方法能够去除几乎所有的错误，而不会引入任何新的错误，从而将每个包含一个错误的 50 个读段的数据质量提高到每个包含一个错误的 1300 个读段。对实验 miRNA 测序数据集的研究表明，错误通常在读段的 5' 端和种子区域得到校正，并且在校正后，miRNA 异构体丰度、单读段数量、整体熵、isomiR 家族、组织特异性 miRNAs 和稀有 miRNA 数量都有显著变化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/05ee/8631080/3bf67dc7afe3/gkab610fig1.jpg

相似文献

Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.

Nucleic Acids Res. 2021 Oct 11;49(18):e106. doi: 10.1093/nar/gkab610.

A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.

BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.

QuorUM: An Error Corrector for Illumina Reads.

PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.

Blue: correcting sequencing errors using consensus and context.

Bioinformatics. 2014 Oct;30(19):2723-32. doi: 10.1093/bioinformatics/btu368. Epub 2014 Jun 11.

ARAMIS: From systematic errors of NGS long reads to accurate assemblies.

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab170.

HISEA: HIerarchical SEed Aligner for PacBio data.

BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.

A sensitive repeat identification framework based on short and long reads.

Nucleic Acids Res. 2021 Sep 27;49(17):e100. doi: 10.1093/nar/gkab563.

Improving the sensitivity of long read overlap detection using grouped short k-mer matches.

BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.

EC: an efficient error correction algorithm for short reads.

BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S2. doi: 10.1186/1471-2105-16-S17-S2. Epub 2015 Dec 7.

A comparative study of k-spectrum-based error correction methods for next-generation sequencing data analysis.

Hum Genomics. 2016 Jul 25;10 Suppl 2(Suppl 2):20. doi: 10.1186/s40246-016-0068-0.

引用本文的文献

MicroRNAs in long COVID: roles, diagnostic biomarker potential and detection.

Hum Genomics. 2025 Aug 13;19(1):90. doi: 10.1186/s40246-025-00810-0.

Intra-Host Co-Existing Strains of SARS-CoV-2 Reference Genome Uncovered by Exhaustive Computational Search.

Viruses. 2023 Apr 26;15(5):1065. doi: 10.3390/v15051065.

A curated human cellular microRNAome based on 196 primary cell types.

Gigascience. 2022 Aug 25;11. doi: 10.1093/gigascience/giac083.

本文引用的文献

Quantitative mapping of the cellular small RNA landscape with AQRNA-seq.

Nat Biotechnol. 2021 Aug;39(8):978-988. doi: 10.1038/s41587-021-00874-y. Epub 2021 Apr 15.

Alternatively spliced isoforms of AUF1 regulate a miRNA-mRNA interaction differentially through their YGG motif.

RNA Biol. 2021 Jun;18(6):843-853. doi: 10.1080/15476286.2020.1822637. Epub 2020 Sep 29.

LEMMI: a continuous benchmarking platform for metagenomics classifiers.

Genome Res. 2020 Aug;30(8):1208-1216. doi: 10.1101/gr.260398.119. Epub 2020 Jul 2.

Prostate cancer early diagnosis: circulating microRNA pairs potentially beyond single microRNAs upon 1231 serum samples.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa111.

AGO-bound mature miRNAs are oligouridylated by TUTs and subsequently degraded by DIS3L2.

Nat Commun. 2020 Jun 2;11(1):2765. doi: 10.1038/s41467-020-16533-w.

Isolating Functional (Iso)miRNA Targets During Ischemia.

Mol Ther. 2020 Jan 8;28(1):7-8. doi: 10.1016/j.ymthe.2019.12.003. Epub 2019 Dec 18.

Aberrant MicroRNAomics in Pulmonary Complications: Implications in Lung Health and Diseases.

Mol Ther Nucleic Acids. 2019 Dec 6;18:413-431. doi: 10.1016/j.omtn.2019.09.007. Epub 2019 Sep 18.

MicroRNA-411 and Its 5'-IsomiR Have Distinct Targets and Functions and Are Differentially Regulated in the Vasculature under Ischemia.

Mol Ther. 2020 Jan 8;28(1):157-170. doi: 10.1016/j.ymthe.2019.10.002. Epub 2019 Oct 7.

Toward a Comprehensive View of MicroRNA Biology.

Mol Cell. 2019 Aug 22;75(4):666-668. doi: 10.1016/j.molcel.2019.08.001.

miR-150-5p Inhibits Non-Small-Cell Lung Cancer Metastasis and Recurrence by Targeting HMGA2 and β-Catenin Signaling.

Mol Ther Nucleic Acids. 2019 Jun 7;16:675-685. doi: 10.1016/j.omtn.2019.04.017. Epub 2019 Apr 23.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

单碱基分辨率下 miRNA reads 的畸变校正超微分析：一种 k-mer 格点方法。

Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.

机构信息

Data Science Institute, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia.

School of Biomedical Engineering, Faculty of Engineering and IT, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia.

出版信息

Nucleic Acids Res. 2021 Oct 11;49(18):e106. doi: 10.1093/nar/gkab610.

DOI:10.1093/nar/gkab610

PMID:34291293

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8631080/

Abstract

摘要

单碱基分辨率下 miRNA reads 的畸变校正超微分析：一种 k-mer 格点方法。

Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

单碱基分辨率下 miRNA reads 的畸变校正超微分析：一种 k-mer 格点方法。

Aberration-corrected ultrafine analysis of miRNA reads at single-base resolution: a k-mer lattice approach.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献