Suppr超能文献

有序生物序列的密度估计及其应用。

Density estimation for ordinal biological sequences and its applications.

机构信息

Department of Physics, <a href="https://ror.org/0028v3876">National Chung Cheng University</a>, Chiayi 62102, Taiwan, Republic of China.

Department of Biology, <a href="https://ror.org/02y3ad647">University of Florida</a>, Gainesville, Florida 32611, USA.

出版信息

Phys Rev E. 2024 Oct;110(4-1):044408. doi: 10.1103/PhysRevE.110.044408.

Abstract

Biological sequences do not come at random. Instead, they appear with particular frequencies that reflect properties of the associated system or phenomenon. Knowing how biological sequences are distributed in sequence space is thus a natural first step toward understanding the underlying mechanisms. Here we propose a method for inferring the probability distribution from which a sample of biological sequences were drawn for the case where the sequences are composed of elements that admit a natural ordering. Our method is based on Bayesian field theory, a physics-based machine learning approach, and can be regarded as a nonparametric extension of the traditional maximum entropy estimate. As an example, we use it to analyze the aneuploidy data pertaining to gliomas from The Cancer Genome Atlas project. In addition, we demonstrate two follow-up analyses that can be performed with the resulting probability distribution. One of them is to investigate the associations among the sequence sites. This provides a way to infer the governing biological grammar. The other is to study the global geometry of the probability landscape, which allows us to look at the problem from an evolutionary point of view. It can be seen that this methodology enables us to learn from a sample of sequences about how a biological system or phenomenon in the real world works.

摘要

生物序列并非随机出现的。相反,它们以特定的频率出现,反映了相关系统或现象的特性。因此,了解生物序列在序列空间中的分布情况是理解其潜在机制的自然第一步。在这里,我们提出了一种方法,可以在序列由允许自然排序的元素组成的情况下,从样本生物序列中推断出概率分布。我们的方法基于贝叶斯场理论,这是一种基于物理的机器学习方法,可以看作是传统最大熵估计的非参数扩展。作为一个例子,我们使用它来分析来自癌症基因组图谱项目的神经胶质瘤的非整倍体数据。此外,我们还展示了可以用所得概率分布执行的两种后续分析。其中之一是研究序列位点之间的关联。这提供了一种推断支配生物语法的方法。另一种是研究概率景观的全局几何形状,这使我们能够从进化的角度看待问题。可以看出,这种方法使我们能够从样本序列中学习有关现实世界中生物系统或现象的工作方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17b8/11605730/3fbd3d495db0/nihms-2037200-f0001.jpg

相似文献

本文引用的文献

3
Aneuploidy as a promoter and suppressor of malignant growth.非整倍体作为恶性生长的促进剂和抑制剂。
Nat Rev Cancer. 2021 Feb;21(2):89-103. doi: 10.1038/s41568-020-00321-1. Epub 2021 Jan 11.
5
Logomaker: beautiful sequence logos in Python.Logomaker:用 Python 绘制优美的序列 logo。
Bioinformatics. 2020 Apr 1;36(7):2272-2274. doi: 10.1093/bioinformatics/btz921.
6
Density Estimation on Small Data Sets.数据集较小情况下的密度估计。
Phys Rev Lett. 2018 Oct 19;121(16):160605. doi: 10.1103/PhysRevLett.121.160605.
7
9
Genomic and Functional Approaches to Understanding Cancer Aneuploidy.基因组和功能方法研究癌症非整倍性。
Cancer Cell. 2018 Apr 9;33(4):676-689.e3. doi: 10.1016/j.ccell.2018.03.007. Epub 2018 Apr 2.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验