Suppr超能文献

pySAPC,一个用于稀疏亲和传播聚类的Python软件包:应用于牙齿发生全基因组时间序列基因表达数据。

pySAPC, a python package for sparse affinity propagation clustering: Application to odontogenesis whole genome time series gene-expression data.

作者信息

Cao Huojun, Amendt Brad A

机构信息

Iowa Institute for Oral Health Research, College of Dentistry, The University of Iowa, Iowa City, IA 52244, USA.

Iowa Institute for Oral Health Research, College of Dentistry, The University of Iowa, Iowa City, IA 52244, USA; Department of Anatomy and Cell Biology and Craniofacial Anomalies Research Center, Carver College of Medicine, The University of Iowa, Iowa City, IA 52244, USA.

出版信息

Biochim Biophys Acta. 2016 Nov;1860(11 Pt B):2613-8. doi: 10.1016/j.bbagen.2016.06.008. Epub 2016 Jun 8.

Abstract

BACKGROUND

Developmental dental anomalies are common forms of congenital defects. The molecular mechanisms of dental anomalies are poorly understood. Systematic approaches such as clustering genes based on similar expression patterns could identify novel genes involved in dental anomalies and provide a framework for understanding molecular regulatory mechanisms of these genes during tooth development (odontogenesis).

METHODS

A python package (pySAPC) of sparse affinity propagation clustering algorithm for large datasets was developed. Whole genome pair-wise similarity was calculated based on expression pattern similarity based on 45 microarrays of several stages during odontogenesis.

RESULTS

pySAPC identified 743 gene clusters based on expression pattern similarity during mouse tooth development. Three clusters are significantly enriched for genes associated with dental anomalies (with FDR <0.1). The three clusters of genes have distinct expression patterns during odontogenesis.

CONCLUSIONS

Clustering genes based on similar expression profiles recovered several known regulatory relationships for genes involved in odontogenesis, as well as many novel genes that may be involved with the same genetic pathways as genes that have already been shown to contribute to dental defects.

GENERAL SIGNIFICANCE

By using sparse similarity matrix, pySAPC use much less memory and CPU time compared with the original affinity propagation program that uses a full similarity matrix. This python package will be useful for many applications where dataset(s) are too large to use full similarity matrix. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang.

摘要

背景

牙齿发育异常是先天性缺陷的常见形式。人们对牙齿异常的分子机制了解甚少。基于相似表达模式对基因进行聚类等系统方法可以识别出参与牙齿异常的新基因,并为理解这些基因在牙齿发育(牙发生)过程中的分子调控机制提供框架。

方法

开发了一种用于大型数据集的稀疏亲和传播聚类算法的Python包(pySAPC)。基于牙发生过程中几个阶段的45个微阵列的表达模式相似性,计算全基因组成对相似性。

结果

pySAPC根据小鼠牙齿发育过程中的表达模式相似性识别出743个基因簇。三个簇显著富集了与牙齿异常相关的基因(FDR<0.1)。这三个基因簇在牙发生过程中具有不同的表达模式。

结论

基于相似表达谱对基因进行聚类,恢复了几个已知的牙发生相关基因的调控关系,以及许多可能与已被证明导致牙齿缺陷的基因具有相同遗传途径的新基因。

普遍意义

通过使用稀疏相似性矩阵,与使用全相似性矩阵的原始亲和传播程序相比,pySAPC使用的内存和CPU时间要少得多。这个Python包将对许多数据集太大而无法使用全相似性矩阵的应用有用。本文是题为“系统遗传学”的特刊的一部分,客座编辑:蔡宇东博士和黄涛博士。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验