Suppr超能文献

Triku:一种基于最近邻的单细胞数据分析特征选择方法。

Triku: a feature selection method based on nearest neighbors for single-cell data.

机构信息

Biodonostia Health Research Institute, Computational Biology and Systems Biomedicine Group, Paseo Dr. Begiristain, s/n, Donostia-San Sebastian, 20014, Spain.

Biodonostia Health Research Institute, Tissue Engineering Group, Paseo Dr. Begiristain, s/n, Donostia-San Sebastian, 20014, Spain.

出版信息

Gigascience. 2022 Mar 12;11. doi: 10.1093/gigascience/giac017.

Abstract

BACKGROUND

Feature selection is a relevant step in the analysis of single-cell RNA sequencing datasets. Most of the current feature selection methods are based on general univariate descriptors of the data such as the dispersion or the percentage of zeros. Despite the use of correction methods, the generality of these feature selection methods biases the genes selected towards highly expressed genes, instead of the genes defining the cell populations of the dataset.

RESULTS

Triku is a feature selection method that favors genes defining the main cell populations. It does so by selecting genes expressed by groups of cells that are close in the k-nearest neighbor graph. The expression of these genes is higher than the expected expression if the k-cells were chosen at random. Triku efficiently recovers cell populations present in artificial and biological benchmarking datasets, based on adjusted Rand index, normalized mutual information, supervised classification, and silhouette coefficient measurements. Additionally, gene sets selected by triku are more likely to be related to relevant Gene Ontology terms and contain fewer ribosomal and mitochondrial genes.

CONCLUSION

Triku is developed in Python 3 and is available at https://github.com/alexmascension/triku.

摘要

背景

特征选择是单细胞 RNA 测序数据集分析中的一个重要步骤。目前大多数特征选择方法都是基于数据的一般单变量描述符,如分散度或零的百分比。尽管使用了校正方法,但这些特征选择方法的通用性使所选基因偏向于高表达基因,而不是定义数据集细胞群体的基因。

结果

Triku 是一种有利于定义主要细胞群体的基因的特征选择方法。它通过选择在 k-最近邻图中接近的细胞群表达的基因来实现这一点。如果随机选择 k 个细胞,这些基因的表达高于预期的表达。基于调整后的 Rand 指数、归一化互信息、监督分类和轮廓系数测量,Triku 能够有效地从人工和生物基准数据集恢复存在的细胞群体。此外,Triku 选择的基因集更有可能与相关的基因本体术语相关联,并且包含更少的核糖体和线粒体基因。

结论

Triku 是用 Python 3 开发的,可以在 https://github.com/alexmascension/triku 上找到。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f743/8917514/ee768b8bcbd4/giac017fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验