Suppr超能文献

基于图的闭棉种 DNA 序列比较分析:揭示遗传关系的通用方法。

Graph-based analysis of DNA sequence comparison in closed cotton species: A generalized method to unveil genetic connections.

机构信息

Institute of Mathematics, Khwaja Fareed University of Engineering & Information Technology, Rahim Yar Khan, Punjab, Pakistan.

Deportment of Mathematics and Statistics, Institute of Southern Punjab, Multan, Punjab, Pakistan.

出版信息

PLoS One. 2024 Sep 17;19(9):e0306608. doi: 10.1371/journal.pone.0306608. eCollection 2024.

Abstract

Graph theory provides a systematic method for modeling and analysing complicated biological data as an effective bioinformatics tool. Based on current trends, the number of DNA sequences in the DNA database is growing quickly. To determine the origin of a species and identify homologous sequences, it is crucial to detect similarities in DNA sequences. Alignment-free techniques are required for accurate measures of sequence similarity, which has been one of the main issues facing computational biologists. The current study provides a mathematical technique for comparing DNA sequences that are constructed in graph theory. The sequences of each DNA were divided into pairs of nucleotides, from which weighted loop digraphs and corresponding weighted vectors were computed. To check the sequence similarity, distance measures like Cosine, Correlation, and Jaccard were employed. To verify the method, DNA segments from the genomes of ten species of cotton were tested. Furthermore, to evaluate the efficacy of the proposed methodology, a K-means clustering method was performed. This study proposes a proof-of-model that utilises a distance matrix approach that promises impressive outcomes with future optimisations to be made to the suggested solution to get the hundred percent accurate result. In the realm of bioinformatics, this paper highlights the use of graph theory as an effective tool for biological data study and sequence comparison. It's expected that further optimization in the proposed solution can bring remarkable results, as this paper presents a proof-of-concept implementation for a given set of data using the proposed distance matrix technique.

摘要

图论为建模和分析复杂的生物数据提供了一种系统的方法,是一种有效的生物信息学工具。基于当前的趋势,DNA 数据库中的 DNA 序列数量正在快速增长。为了确定物种的起源和识别同源序列,检测 DNA 序列的相似性至关重要。为了进行准确的序列相似性度量,需要使用无比对技术,这一直是计算生物学家面临的主要问题之一。本研究提供了一种在图论中构建比较 DNA 序列的数学技术。将每个 DNA 的序列分成碱基对,从中计算出加权环 digraph 和相应的加权向量。为了检查序列相似性,使用余弦、相关性和 Jaccard 等距离度量。为了验证该方法,测试了来自十种棉花基因组的 DNA 片段。此外,为了评估所提出方法的功效,进行了 K-均值聚类方法。本研究提出了一种模型验证方法,该方法利用距离矩阵方法,有望在未来对建议的解决方案进行优化,以获得百分之百的准确结果。在生物信息学领域,本文强调了图论作为生物数据研究和序列比较的有效工具的应用。预计对所提出的解决方案进行进一步优化可以带来显著的结果,因为本文提出了一种使用所提出的距离矩阵技术对给定数据集进行概念验证的实现。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验