Suppr超能文献

GBScleanR:使用具有错误模式识别的隐马尔可夫模型进行稳健的基因分型错误校正。

GBScleanR: robust genotyping error correction using a hidden Markov model with error pattern recognition.

机构信息

Institute of Plant Science and Resources, Okayama University, Chu-oh 2-20-1, Kurashiki, Okayama 710-0046, Japan.

Bioscience and Biotechnology Center, Nagoya University, Furo-cho, Chikusa, Nagoya, Aichi 464-8601, Japan.

出版信息

Genetics. 2023 May 26;224(2). doi: 10.1093/genetics/iyad055.

Abstract

Reduced-representation sequencing (RRS) provides cost-effective and time-saving genotyping platforms. Despite the outstanding advantage of RRS in throughput, the obtained genotype data usually contain a large number of errors. Several error correction methods employing the hidden Markov model (HMM) have been developed to overcome these issues. These methods assume that markers have a uniform error rate with no bias in the allele read ratio. However, bias does occur because of uneven amplification of genomic fragments and read mismapping. In this paper, we introduce an error correction tool, GBScleanR, which enables robust and precise error correction for noisy RRS-based genotype data by incorporating marker-specific error rates into the HMM. The results indicate that GBScleanR improves the accuracy by more than 25 percentage points at maximum compared to the existing tools in simulation data sets and achieves the most reliable genotype estimation in real data even with error-prone markers.

摘要

简化代表性测序(RRS)提供了具有成本效益和节省时间的基因分型平台。尽管 RRS 在通量方面具有出色的优势,但获得的基因型数据通常包含大量错误。已经开发了几种利用隐马尔可夫模型(HMM)的错误校正方法来克服这些问题。这些方法假设标记具有均匀的错误率,并且等位基因读取比率没有偏差。然而,由于基因组片段的不均匀扩增和读取错配,确实会出现偏差。在本文中,我们引入了一种错误校正工具 GBScleanR,该工具通过将标记特异性错误率纳入 HMM,为基于嘈杂的 RRS 的基因型数据实现了稳健而精确的错误校正。结果表明,与模拟数据集相比,GBScleanR 在最大程度上提高了准确性,超过 25 个百分点,并且即使使用易错标记,也能在实际数据中实现最可靠的基因型估计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a05e/10213493/502807d64e61/iyad055f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验