Suppr超能文献

应用于葫芦科DNA序列的超统计学

Superstatistics Applied to Cucurbitaceae DNA Sequences.

作者信息

Costa M O, Silva R, de Lima M M F, Anselmo D H A L

机构信息

Departamento de Física, Universidade Federal do Rio Grande do Norte, Natal 59072-970, Brazil.

Departamento de Física, Universidade do Estado do Rio Grande do Norte, Mossoró 59610-210, Brazil.

出版信息

Entropy (Basel). 2024 Sep 25;26(10):819. doi: 10.3390/e26100819.

Abstract

The short and long statistical correlations are essential in the genomic sequence. Such correlations are long-range for introns, whereas, for exons, these are short. In this study, we employed superstatistics to investigate correlations and fluctuations in the distribution of nucleotide sequence lengths of the family. We established a time series for exon sizes to probe these correlations and fluctuations. We used data from the National Center for Biotechnology Information (NCBI) gene database to extract the temporal evolution of exon sizes, measured in terms of the number of base pairs (bp). To assess the model's viability, we utilized a timescale extraction method to determine the statistical properties of our time series, including the local distribution and fluctuations, which provide the exon size distributions based on the -Gamma and inverse -Gamma distributions. From the Bayesian statistics standpoint, both distributions are excellent for capturing the correlations and fluctuations from the data.

摘要

短程和长程统计相关性在基因组序列中至关重要。对于内含子而言,这种相关性是长程的,而对于外显子来说,这些相关性是短程的。在本研究中,我们采用超统计学来研究该家族核苷酸序列长度分布中的相关性和涨落。我们建立了外显子大小的时间序列以探究这些相关性和涨落。我们使用来自美国国立生物技术信息中心(NCBI)基因数据库的数据来提取以碱基对(bp)数量衡量的外显子大小的时间演化。为了评估模型的可行性,我们利用一种时间尺度提取方法来确定我们时间序列的统计特性,包括局部分布和涨落,这些基于伽马分布和逆伽马分布提供外显子大小分布。从贝叶斯统计学的角度来看,这两种分布都非常适合从数据中捕捉相关性和涨落。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/206d/11507824/706a4b84512b/entropy-26-00819-g0A1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验