Suppr超能文献

机器学习方法的交响乐揭示了单细胞数据中的标志性事件,以衰老成纤维细胞为例。

An orchestra of machine learning methods reveals landmarks in single-cell data exemplified with aging fibroblasts.

机构信息

Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany.

出版信息

PLoS One. 2024 Apr 17;19(4):e0302045. doi: 10.1371/journal.pone.0302045. eCollection 2024.

Abstract

In this work, a Python framework for characteristic feature extraction is developed and applied to gene expression data of human fibroblasts. Unlabeled feature selection objectively determines groups and minimal gene sets separating groups. ML explainability methods transform the features correlating with phenotypic differences into causal reasoning, supported by further pipeline and visualization tools, allowing user knowledge to boost causal reasoning. The purpose of the framework is to identify characteristic features that are causally related to phenotypic differences of single cells. The pipeline consists of several data science methods enriched with purposeful visualization of the intermediate results in order to check them systematically and infuse the domain knowledge about the investigated process. A specific focus is to extract a small but meaningful set of genes to facilitate causal reasoning for the phenotypic differences. One application could be drug target identification. For this purpose, the framework follows different steps: feature reduction (PFA), low dimensional embedding (UMAP), clustering ((H)DBSCAN), feature correlation (chi-square, mutual information), ML validation and explainability (SHAP, tree explainer). The pipeline is validated by identifying and correctly separating signature genes associated with aging in fibroblasts from single-cell gene expression measurements: PLK3, polo-like protein kinase 3; CCDC88A, Coiled-Coil Domain Containing 88A; STAT3, signal transducer and activator of transcription-3; ZNF7, Zinc Finger Protein 7; SLC24A2, solute carrier family 24 member 2 and lncRNA RP11-372K14.2. The code for the preprocessing step can be found in the GitHub repository https://github.com/AC-PHD/NoLabelPFA, along with the characteristic feature extraction https://github.com/LauritzR/characteristic-feature-extraction.

摘要

在这项工作中,开发了一个用于特征提取的 Python 框架,并将其应用于人类成纤维细胞的基因表达数据。无标签特征选择客观地确定了分组和最小基因集来区分组。机器学习可解释性方法将与表型差异相关的特征转换为因果推理,并通过进一步的管道和可视化工具提供支持,使用户的知识能够促进因果推理。该框架的目的是识别与单细胞表型差异有因果关系的特征。该管道由几个数据科学方法组成,并辅以中间结果的有针对性的可视化,以便系统地检查它们并注入对所研究过程的领域知识。一个特定的重点是提取一个小但有意义的基因集,以促进对表型差异的因果推理。一个应用可能是药物靶标识别。为此,该框架遵循以下不同步骤:特征减少(PFA)、低维嵌入(UMAP)、聚类((H)DBSCAN)、特征相关性(卡方、互信息)、机器学习验证和可解释性(SHAP、树解释器)。通过从单细胞基因表达测量中识别和正确分离与成纤维细胞衰老相关的特征基因来验证该管道:PLK3、 polo 样蛋白激酶 3;CCDC88A、Coiled-Coil Domain Containing 88A;STAT3、信号转导和转录激活因子-3;ZNF7、锌指蛋白 7;SLC24A2、溶质载体家族 24 成员 2 和 lncRNA RP11-372K14.2。预处理步骤的代码可以在 GitHub 存储库 https://github.com/AC-PHD/NoLabelPFA 中找到,同时还可以找到特征提取 https://github.com/LauritzR/characteristic-feature-extraction。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a9d/11023401/f3ad239f5853/pone.0302045.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验