Suppr超能文献

用于乳腺癌数据中快速 DNA 突变检测的时间卷积网络。

Temporal convolutional network for a Fast DNA mutation detection in breast cancer data.

机构信息

Bandung Institute of Technology, Doctoral Program of Electrical Engineering and Informatics, School of Electrical and Information Engineering, Bandung, Indonesia.

School of Computing, Telkom University, Bandung, Indonesia.

出版信息

PLoS One. 2023 May 25;18(5):e0285981. doi: 10.1371/journal.pone.0285981. eCollection 2023.

Abstract

Early detection of breast cancer can be achieved through mutation detection in DNA sequences, which can be acquired through patient blood samples. Mutation detection can be performed using alignment and machine learning techniques. However, alignment techniques require reference sequences, and machine learning techniques still cannot predict index mutation and require supporting tools. Therefore, in this research, a Temporal Convolutional Network (TCN) model was proposed to detect the type and index mutation faster and without reference sequences and supporting tools. The architecture of the proposed TCN model is specifically designed for sequential labeling tasks on DNA sequence data. This allows for the detection of the mutation type of each nucleotide in the sequence, and if the nucleotide has a mutation, the index mutation can be obtained. The proposed model also uses 2-mers and 3-mers mapping techniques to improve detection performance. Based on the tests that have been carried out, the proposed TCN model can achieve the highest F1-score of 0.9443 for COSMIC dataset and 0.9629 for RSCM dataset, Additionally, the proposed TCN model can detect index mutation six times faster than BiLSTM model. Furthermore, the proposed model can detect type and index mutations based on the patient's DNA sequence, without the need for reference sequences or other additional tools.

摘要

早期乳腺癌的检测可以通过 DNA 序列中的突变检测来实现,这些突变可以从患者的血液样本中获得。突变检测可以使用对齐和机器学习技术来进行。然而,对齐技术需要参考序列,而机器学习技术仍然无法预测索引突变,并且需要支持工具。因此,在这项研究中,提出了一种基于时间卷积网络(TCN)的模型,用于更快地检测突变类型和索引突变,而无需参考序列和支持工具。所提出的 TCN 模型的架构是专门为 DNA 序列数据的顺序标记任务设计的。这允许检测序列中每个核苷酸的突变类型,如果核苷酸发生突变,则可以获得索引突变。所提出的模型还使用 2-mers 和 3-mers 映射技术来提高检测性能。基于已经进行的测试,所提出的 TCN 模型可以在 COSMIC 数据集上达到最高的 F1 分数 0.9443,在 RSCM 数据集上达到 0.9629。此外,与 BiLSTM 模型相比,所提出的 TCN 模型可以将索引突变的检测速度提高六倍。此外,该模型可以基于患者的 DNA 序列检测突变类型和索引突变,而无需参考序列或其他额外的工具。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验