• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过格拉斯曼流形探索基因组空间的几何结构。

Exploring geometry of genome space via Grassmann manifolds.

作者信息

Li Xiaoguang, Zhou Tao, Feng Xingdong, Yau Shing-Tung, Yau Stephen S-T

机构信息

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China.

Department of Mathematical Sciences, Tsinghua University, Beijing 100084, China.

出版信息

Innovation (Camb). 2024 Jul 22;5(5):100677. doi: 10.1016/j.xinn.2024.100677. eCollection 2024 Sep 9.

DOI:10.1016/j.xinn.2024.100677
PMID:39206218
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11350263/
Abstract

It is important to understand the geometry of genome space in biology. After transforming genome sequences into frequency matrices of the chaos game representation (FCGR), we regard a genome sequence as a point in a suitable Grassmann manifold by analyzing the column space of the corresponding FCGR. To assess the sequence similarity, we employ the generalized Grassmannian distance, an intrinsic geometric distance that differs from the traditional Euclidean distance used in the classical k-mer frequency-based methods. With this method, we constructed phylogenetic trees for various genome datasets, including influenza A virus hemagglutinin gene, Orthocoronavirinae genome, and SARS-CoV-2 complete genome sequences. Our comparative analysis with multiple sequence alignment and alignment-free methods for large-scale sequences revealed that our method, which employs the subspace distance between the column spaces of different FCGRs (FCGR-SD), outperformed its competitors in terms of both speed and accuracy. In addition, we used low-dimensional visualization of the SARS-CoV-2 genome sequences and spike protein nucleotide sequences with our methods, resulting in some intriguing findings. We not only propose a novel and efficient algorithm for comparing genome sequences but also demonstrate that genome data have some intrinsic manifold structures, providing a new geometric perspective for molecular biology studies.

摘要

了解生物学中基因组空间的几何结构很重要。在将基因组序列转化为混沌游戏表示的频率矩阵(FCGR)后,我们通过分析相应FCGR的列空间,将基因组序列视为合适格拉斯曼流形中的一个点。为了评估序列相似性,我们采用广义格拉斯曼距离,这是一种与基于经典k-mer频率的方法中使用的传统欧几里得距离不同的内在几何距离。使用这种方法,我们为各种基因组数据集构建了系统发育树,包括甲型流感病毒血凝素基因、正冠状病毒亚科基因组和严重急性呼吸综合征冠状病毒2(SARS-CoV-2)全基因组序列。我们对大规模序列的多序列比对和无比对方法进行的比较分析表明,我们采用不同FCGR列空间之间子空间距离的方法(FCGR-SD)在速度和准确性方面均优于其竞争对手。此外,我们用我们的方法对SARS-CoV-2基因组序列和刺突蛋白核苷酸序列进行了低维可视化,得出了一些有趣的发现。我们不仅提出了一种用于比较基因组序列的新颖高效算法,还证明了基因组数据具有一些内在的流形结构,为分子生物学研究提供了新的几何视角。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/f1f72353e024/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/9836eced3712/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/90357a9d5254/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/6302d4c42e04/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/4e52239d82de/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/c7771000439f/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/f1f72353e024/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/9836eced3712/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/90357a9d5254/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/6302d4c42e04/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/4e52239d82de/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/c7771000439f/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52fb/11350263/f1f72353e024/gr5.jpg

相似文献

1
Exploring geometry of genome space via Grassmann manifolds.通过格拉斯曼流形探索基因组空间的几何结构。
Innovation (Camb). 2024 Jul 22;5(5):100677. doi: 10.1016/j.xinn.2024.100677. eCollection 2024 Sep 9.
2
Applying frequency chaos game representation with perceptual image hashing to gene sequence phylogenetic analyses.运用具有感知图像哈希的频率混沌游戏表示法进行基因序列系统发育分析。
J Mol Graph Model. 2021 Sep;107:107942. doi: 10.1016/j.jmgm.2021.107942. Epub 2021 May 23.
3
Accurate and fast clade assignment via deep learning and frequency chaos game representation.通过深度学习和频率混沌游戏表示实现准确快速的进化枝分配。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giac119.
4
A phylogenetic analysis of the brassicales clade based on an alignment-free sequence comparison method.基于无比对序列比较方法的芸薹族系统发育分析。
Front Plant Sci. 2012 Aug 29;3:192. doi: 10.3389/fpls.2012.00192. eCollection 2012.
5
Image Set Classification Using a Distance-Based Kernel Over Affine Grassmann Manifold.基于仿射格拉斯曼流形上基于距离的核的图像集分类
IEEE Trans Neural Netw Learn Syst. 2021 Mar;32(3):1082-1095. doi: 10.1109/TNNLS.2020.2980059. Epub 2021 Mar 1.
6
Mahalanobis distance on extended Grassmann manifolds for variational pattern analysis.扩展 Grassmann 流形上的马氏距离用于变分模式分析。
IEEE Trans Neural Netw Learn Syst. 2014 Nov;25(11):1980-90. doi: 10.1109/TNNLS.2014.2301178.
7
Chaos game representation and its applications in bioinformatics.混沌游戏表示法及其在生物信息学中的应用。
Comput Struct Biotechnol J. 2021 Nov 10;19:6263-6271. doi: 10.1016/j.csbj.2021.11.008. eCollection 2021.
8
Domain Adaptation as Optimal Transport on Grassmann Manifolds.格拉斯曼流形上作为最优传输的域适应
IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):7196-7209. doi: 10.1109/TNNLS.2021.3139119. Epub 2023 Oct 6.
9
An improved model for whole genome phylogenetic analysis by Fourier transform.一种通过傅里叶变换进行全基因组系统发育分析的改进模型。
J Theor Biol. 2015 Oct 7;382:99-110. doi: 10.1016/j.jtbi.2015.06.033. Epub 2015 Jul 4.
10
Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels.基于高斯 RBF 核的黎曼流形上的核方法。
IEEE Trans Pattern Anal Mach Intell. 2015 Dec;37(12):2464-77. doi: 10.1109/TPAMI.2015.2414422.

引用本文的文献

1
Application of multigene panel testing for bleeding, thrombotic, and platelet disorders in patients and the general population in China.多基因检测在中国患者及普通人群出血、血栓形成和血小板疾病中的应用。
Mol Biomed. 2025 Jun 9;6(1):39. doi: 10.1186/s43556-025-00283-6.
2
Methyl-GP: accurate generic DNA methylation prediction based on a language model and representation learning.甲基化基因组图谱(Methyl-GP):基于语言模型和表征学习的准确通用DNA甲基化预测
Nucleic Acids Res. 2025 Mar 20;53(6). doi: 10.1093/nar/gkaf223.

本文引用的文献

1
Applications of machine learning in phylogenetics.机器学习在系统发生学中的应用。
Mol Phylogenet Evol. 2024 Jul;196:108066. doi: 10.1016/j.ympev.2024.108066. Epub 2024 Mar 31.
2
Phylogenetic inference using generative adversarial networks.基于生成对抗网络的系统发育推断。
Bioinformatics. 2023 Sep 2;39(9). doi: 10.1093/bioinformatics/btad543.
3
Benchmarking machine learning robustness in Covid-19 genome sequence classification.在新冠病毒基因组序列分类中对机器学习鲁棒性进行基准测试。
Sci Rep. 2023 Mar 13;13(1):4154. doi: 10.1038/s41598-023-31368-3.
4
BA.2 and BA.5 omicron differ immunologically from both BA.1 omicron and pre-omicron variants.BA.2 和 BA.5 奥密克戎在免疫学上与 BA.1 奥密克戎和前奥密克戎变体不同。
Nat Commun. 2022 Dec 13;13(1):7701. doi: 10.1038/s41467-022-35312-3.
5
Efficient Approximate Kernel Based Spike Sequence Classification.高效基于核的近似 Spike 序列分类。
IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3376-3388. doi: 10.1109/TCBB.2022.3206284. Epub 2023 Dec 25.
6
Antigenic cartography of SARS-CoV-2 reveals that Omicron BA.1 and BA.2 are antigenically distinct.SARS-CoV-2 抗原图谱显示,奥密克戎 BA.1 和 BA.2 具有不同的抗原性。
Sci Immunol. 2022 Sep 23;7(75):eabq4450. doi: 10.1126/sciimmunol.abq4450.
7
Revealing the recent demographic history of Europe via haplotype sharing in the UK Biobank.通过在英国生物银行中的单倍型共享揭示欧洲近期的人口历史。
Proc Natl Acad Sci U S A. 2022 Jun 21;119(25):e2119281119. doi: 10.1073/pnas.2119281119. Epub 2022 Jun 13.
8
Why is the SARS-CoV-2 Omicron variant milder?为什么新冠病毒奥密克戎变种的症状较轻?
Innovation (Camb). 2022 Jul 12;3(4):100251. doi: 10.1016/j.xinn.2022.100251. Epub 2022 Apr 26.
9
Towards SARS-CoV-2 serotypes?是否存在 SARS-CoV-2 的血清型?
Nat Rev Microbiol. 2022 Apr;20(4):187-188. doi: 10.1038/s41579-022-00708-x.
10
Detection of intra-family coronavirus genome sequences through graphical representation and artificial neural network.通过图形表示和人工神经网络检测家庭内部冠状病毒基因组序列
Expert Syst Appl. 2022 May 15;194:116559. doi: 10.1016/j.eswa.2022.116559. Epub 2022 Jan 21.