Mink Sylvia, Attenberger Christian, Busch Yannik, Kiefer Johanna, Peter Wolfgang, Cadamuro Janne, Steiert Tim A, Franke Andre, Gassner Christoph
Central Medical Laboratories, Carinagasse 41, 6800 Feldkirch, Austria.
Institute of Translational Medicine, Private University in the Principality of Liechtenstein, 9495 Triesen, Liechtenstein.
Int J Mol Sci. 2025 Apr 7;26(7):3443. doi: 10.3390/ijms26073443.
Despite providing highly accurate results, the short reads generated by second generation sequencing have major limitations in mapping complex genomic regions. Longer reads can resolve these issues and additionally phase distant variants. The third generation sequencing platform ONT currently achieves the longest sequencing reads but falls short in sequencing accuracy. Additionally, deriving phased haplotypes from amplicon-based NGS data remains a complex and time-consuming task that requires extensive bioinformatic expertise. We constructed an integrative, open-access modular data-analysis framework that allows for automated processing of high-throughput sequencing data from both second (Illumina) and third generation (ONT) sequencing platforms, combining the strengths of both technologies. Variant information is automatically evaluated and color-coded for discrepancies. Haplotypes are listed by frequency. All parts of the framework can be used independently. The framework's performance was validated using synthetic and tested with real-life data by analyzing partly homologous // sequencing data from 400 blood donors.
尽管第二代测序产生的短读长能提供高度准确的结果,但在绘制复杂基因组区域时存在重大局限性。更长的读长可以解决这些问题,还能对远距离变异进行定相。第三代测序平台ONT目前能实现最长的测序读长,但测序准确性不足。此外,从基于扩增子的NGS数据中推导定相单倍型仍然是一项复杂且耗时的任务,需要广泛的生物信息学专业知识。我们构建了一个集成的、开放获取的模块化数据分析框架,该框架允许对来自第二代(Illumina)和第三代(ONT)测序平台的高通量测序数据进行自动化处理,结合了两种技术的优势。变异信息会自动评估,并针对差异进行颜色编码。单倍型按频率列出。框架的所有部分都可以独立使用。通过分析400名献血者的部分同源测序数据,使用合成数据对该框架的性能进行了验证,并使用实际数据进行了测试。