Computational Biology, St Jude Children's Research Hospital, Memphis, TN 38105, USA.
Bioinformatics. 2022 Jan 3;38(2):549-551. doi: 10.1093/bioinformatics/btab601.
Small insertions and deletions (indels) in nucleotide sequence may be represented differently between mapping algorithms and variant callers, or in the flanking sequence context. Representational ambiguity is especially profound for complex indels, complicating comparisons between multiple mappings and call sets. Complex indels may additionally suffer from incomplete allele representation, potentially leading to critical misannotation of variant effect. We present indelPost, a Python library that harmonizes these ambiguities for simple and complex indels via realignment and read-based phasing. We demonstrate that indelPost enables accurate analysis of ambiguous data and can derive the correct complex indel alleles from the simple indel predictions provided by standard small variant detectors, with improved performance over a specialized tool for complex indel analysis.
indelPost is freely available at: https://github.com/stjude/indelPost.
Supplementary data are available at Bioinformatics online.
核苷酸序列中的小插入和缺失(indels)可能在映射算法和变异调用者之间,或者在侧翼序列上下文中表现不同。复杂 indels 的表示不确定性特别大,这使得多个映射和调用集之间的比较变得复杂。复杂 indels 可能还存在等位基因表示不完整的问题,这可能导致对变异效应的关键错误注释。我们提出了 indelPost,这是一个 Python 库,通过重排和基于读取的相位分析,对简单和复杂 indels 中的这些不确定性进行协调。我们证明了 indelPost 可以对不确定的数据进行准确分析,并可以从标准小型变异探测器提供的简单 indel 预测中得出正确的复杂 indel 等位基因,其性能优于专门用于复杂 indel 分析的工具。
indelPost 可在以下网址免费获得:https://github.com/stjude/indelPost。
补充数据可在 Bioinformatics 在线获得。