Suppr超能文献

稀疏表示学习从艾伦老鼠大脑图谱中获得具有明确基因权重的生物学特征。

Sparse representation learning derives biological features with explicit gene weights from the Allen Mouse Brain Atlas.

机构信息

School for Biological and Health Systems Engineering, Arizona State University, Tempe, Arizona, United States of America.

Department of Mathematics, Tufts University, Medford, Massachusetts, United States of America.

出版信息

PLoS One. 2023 Mar 6;18(3):e0282171. doi: 10.1371/journal.pone.0282171. eCollection 2023.

Abstract

Unsupervised learning methods are commonly used to detect features within transcriptomic data and ultimately derive meaningful representations of biology. Contributions of individual genes to any feature however becomes convolved with each learning step, requiring follow up analysis and validation to understand what biology might be represented by a cluster on a low dimensional plot. We sought learning methods that could preserve the gene information of detected features, using the spatial transcriptomic data and anatomical labels of the Allen Mouse Brain Atlas as a test dataset with verifiable ground truth. We established metrics for accurate representation of molecular anatomy to find sparse learning approaches were uniquely capable of generating anatomical representations and gene weights in a single learning step. Fit to labeled anatomy was highly correlated with intrinsic properties of the data, offering a means to optimize parameters without established ground truth. Once representations were derived, complementary gene lists could be further compressed to generate a low complexity dataset, or to probe for individual features with >95% accuracy. We demonstrate the utility of sparse learning as a means to derive biologically meaningful representations from transcriptomic data and reduce the complexity of large datasets while preserving intelligible gene information throughout the analysis.

摘要

无监督学习方法常用于检测转录组数据中的特征,并最终得出生物学的有意义表示。然而,单个基因对任何特征的贡献都与每个学习步骤交织在一起,需要进行后续分析和验证,以了解低维图谱上的聚类代表什么生物学。我们寻求能够保留检测到的特征的基因信息的学习方法,使用 Allen 小鼠大脑图谱的空间转录组数据和解剖标签作为测试数据集,具有可验证的真实信息。我们建立了用于准确表示分子解剖结构的指标,发现稀疏学习方法能够在单个学习步骤中生成独特的解剖表示和基因权重。与有标签的解剖结构的拟合与数据的内在特性高度相关,为在没有既定真实信息的情况下优化参数提供了一种方法。一旦得到表示,就可以进一步压缩补充的基因列表,以生成一个低复杂度的数据集,或者以 95%以上的准确率探测单个特征。我们展示了稀疏学习作为一种从转录组数据中提取有生物学意义的表示并降低大型数据集复杂性的方法的实用性,同时在整个分析过程中保留可理解的基因信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e13d/9987823/90761782f4fc/pone.0282171.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验