College of Intelligence and Computing, Tianjin University.
School of Life Sciences, North China University of Science and Technology.
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa284.
With the development of high-throughput sequencing technology, the genomic sequences increased exponentially over the last decade. In order to decode these new genomic data, machine learning methods were introduced for genome annotation and analysis. Due to the requirement of most machines learning methods, the biological sequences must be represented as fixed-length digital vectors. In this representation procedure, the physicochemical properties of k-tuple nucleotides are important information. However, the values of the physicochemical properties of k-tuple nucleotides are scattered in different resources. To facilitate the studies on genomic sequences, we developed the first comprehensive database, namely KNIndex (https://knindex.pufengdu.org), for depositing and visualizing physicochemical properties of k-tuple nucleotides. Currently, the KNIndex database contains 182 properties including one for mononucleotide (DNA), 169 for dinucleotide (147 for DNA and 22 for RNA) and 12 for trinucleotide (DNA). KNIndex database also provides a user-friendly web-based interface for the users to browse, query, visualize and download the physicochemical properties of k-tuple nucleotides. With the built-in conversion and visualization functions, users are allowed to display DNA/RNA sequences as curves of multiple physicochemical properties. We wish that the KNIndex will facilitate the related studies in computational biology.
随着高通量测序技术的发展,在过去十年中,基因组序列呈指数级增长。为了解码这些新的基因组数据,引入了机器学习方法来进行基因组注释和分析。由于大多数机器学习方法的要求,生物序列必须表示为固定长度的数字向量。在这种表示过程中,k 元核苷酸的理化性质是重要信息。然而,k 元核苷酸理化性质的值分散在不同的资源中。为了方便对基因组序列的研究,我们开发了第一个综合数据库,即 KNIndex(https://knindex.pufengdu.org),用于存储和可视化 k 元核苷酸的理化性质。目前,KNIndex 数据库包含 182 种属性,包括单核苷酸(DNA)的一种、二核苷酸(147 种 DNA 和 22 种 RNA)的 169 种和三核苷酸(DNA)的 12 种。KNIndex 数据库还为用户提供了一个用户友好的基于网络的界面,用于浏览、查询、可视化和下载 k 元核苷酸的理化性质。通过内置的转换和可视化功能,用户可以将 DNA/RNA 序列显示为多个理化性质的曲线。我们希望 KNIndex 将有助于计算生物学的相关研究。