• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于拓扑学和机器学习模型的基因表达数据分类。

Gene expression data classification using topology and machine learning models.

机构信息

Department of Computer Science, Purdue University, West Lafayette, IN, USA.

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA.

出版信息

BMC Bioinformatics. 2022 May 20;22(Suppl 10):627. doi: 10.1186/s12859-022-04704-z.

DOI:10.1186/s12859-022-04704-z
PMID:35596135
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9121583/
Abstract

BACKGROUND

Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes.

RESULTS

The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able to comprehend gene expression levels and classify cohorts accordingly.

CONCLUSIONS

In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.

摘要

背景

高通量基因表达数据的解释仍然需要数据分析中的数学工具,这些工具能够识别高维数据的形状。拓扑数据分析(TDA)最近在处理高维结构的几个应用中成功地提取了稳健的特征。在这项工作中,我们利用 TDA 的一些最新进展来整理基因表达数据。我们的工作与前人的工作有两个不同之处:(1)传统的 TDA 管道使用拓扑签名(称为条形码)来增强用于分类的特征向量。相比之下,这项工作涉及整理相关特征,以在 TDA 的帮助下获得更好的代表性。整个数据的这些代表有助于更好地理解表型标签。(2)早期的大多数工作都使用拓扑摘要获得的条形码作为数据的指纹。尽管它们是稳定的签名,但数据和所述条形码之间不存在直接映射。

结果

我们获得的与拓扑相关的经过整理的数据,无论是在浅层学习还是基于监督分类的深度学习中,都有改进。我们进一步表明,我们计算的代表性循环具有对表型标签的无监督倾向。因此,这项工作表明拓扑特征能够理解基因表达水平并相应地对队列进行分类。

结论

在这项工作中,我们生成了有代表性的持久循环来辨别基因表达数据。这些循环使我们能够直接获得涉及类似过程的基因。

相似文献

1
Gene expression data classification using topology and machine learning models.基于拓扑学和机器学习模型的基因表达数据分类。
BMC Bioinformatics. 2022 May 20;22(Suppl 10):627. doi: 10.1186/s12859-022-04704-z.
2
A Survey of Topological Machine Learning Methods.拓扑机器学习方法综述
Front Artif Intell. 2021 May 26;4:681108. doi: 10.3389/frai.2021.681108. eCollection 2021.
3
TDA-Net: Fusion of Persistent Homology and Deep Learning Features for COVID-19 Detection From Chest X-Ray Images.TDA-Net:基于持久同调与深度学习特征融合的 COVID-19 胸部 X 射线图像检测方法。
Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:4115-4119. doi: 10.1109/EMBC46164.2021.9629828.
4
The topology of data: opportunities for cancer research.数据拓扑结构:癌症研究的机遇
Bioinformatics. 2021 Oct 11;37(19):3091-3098. doi: 10.1093/bioinformatics/btab553.
5
Topological data analysis in biomedicine: A review.生物医学中的拓扑数据分析:综述。
J Biomed Inform. 2022 Jun;130:104082. doi: 10.1016/j.jbi.2022.104082. Epub 2022 May 1.
6
The shape of things to come: Topological data analysis and biology, from molecules to organisms.未来的形态:从分子到生物体的拓扑数据分析和生物学。
Dev Dyn. 2020 Jul;249(7):816-833. doi: 10.1002/dvdy.175. Epub 2020 Apr 13.
7
MaTiLDA: An Integrated Machine Learning and Topological Data Analysis Platform for Brain Network Dynamics.MaTiLDA:一个用于脑网络动力学的集成机器学习和拓扑数据分析平台。
Pac Symp Biocomput. 2024;29:65-80.
8
A topological data analysis based classification method for multiple measurements.基于拓扑数据分析的多测量分类方法。
BMC Bioinformatics. 2020 Jul 29;21(1):336. doi: 10.1186/s12859-020-03659-3.
9
Topological data analysis in medical imaging: current state of the art.医学成像中的拓扑数据分析:当前技术现状
Insights Imaging. 2023 Apr 1;14(1):58. doi: 10.1186/s13244-023-01413-w.
10
Hybrid Topological Data Analysis and Deep Learning for Basal Cell Carcinoma Diagnosis.混合拓扑数据分析和深度学习在基底细胞癌诊断中的应用。
J Imaging Inform Med. 2024 Feb;37(1):92-106. doi: 10.1007/s10278-023-00924-8. Epub 2024 Jan 12.

引用本文的文献

1
PredCoffee: A binary classification approach specifically for coffee odor.PredCoffee:一种专门针对咖啡气味的二元分类方法。
iScience. 2024 May 21;27(6):110041. doi: 10.1016/j.isci.2024.110041. eCollection 2024 Jun 21.

本文引用的文献

1
Topological data analysis quantifies biological nano-structure from single molecule localization microscopy.拓扑数据分析可从单分子定位显微镜中定量生物纳米结构。
Bioinformatics. 2020 Mar 1;36(5):1614-1621. doi: 10.1093/bioinformatics/btz788.
2
Machine learning analysis of gene expression data reveals novel diagnostic and prognostic biomarkers and identifies therapeutic targets for soft tissue sarcomas.基于基因表达数据的机器学习分析揭示了软组织肉瘤的新型诊断和预后生物标志物,并确定了治疗靶点。
PLoS Comput Biol. 2019 Feb 20;15(2):e1006826. doi: 10.1371/journal.pcbi.1006826. eCollection 2019 Feb.
3
PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools.
PANTHER 版本 14:更多基因组、一个新的 PANTHER GO-slim 和富集分析工具的改进。
Nucleic Acids Res. 2019 Jan 8;47(D1):D419-D426. doi: 10.1093/nar/gky1038.
4
A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification.基于随机森林的深度神经网络模型在基因表达数据分类中的特征提取。
Sci Rep. 2018 Nov 7;8(1):16477. doi: 10.1038/s41598-018-34833-6.
5
TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions.拓扑网络:用于生物分子性质预测的基于拓扑的深度卷积和多任务神经网络。
PLoS Comput Biol. 2017 Jul 27;13(7):e1005690. doi: 10.1371/journal.pcbi.1005690. eCollection 2017 Jul.
6
Clustering Algorithms: Their Application to Gene Expression Data.聚类算法:它们在基因表达数据中的应用。
Bioinform Biol Insights. 2016 Nov 30;10:237-253. doi: 10.4137/BBI.S38316. eCollection 2016.
7
Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology.使用持久拓扑学鉴定乳腺癌亚型中的拷贝数畸变
Microarrays (Basel). 2015 Aug 12;4(3):339-69. doi: 10.3390/microarrays4030339.
8
Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants.表达图谱更新——一个关于人类、动物和植物基因与蛋白质表达的综合数据库。
Nucleic Acids Res. 2016 Jan 4;44(D1):D746-52. doi: 10.1093/nar/gkv1045. Epub 2015 Oct 19.
9
A robust topology-based algorithm for gene expression profiling.一种用于基因表达谱分析的基于拓扑结构的强大算法。
ISRN Bioinform. 2012 Nov 11;2012:381023. doi: 10.5402/2012/381023. eCollection 2012.
10
Segmenting the papillary muscles and the trabeculae from high resolution cardiac CT through restoration of topological handles.通过恢复拓扑手柄从高分辨率心脏CT中分割乳头肌和小梁。
Inf Process Med Imaging. 2013;23:184-95. doi: 10.1007/978-3-642-38868-2_16.