Orlandini Valerio, Provenzano Aldesia, Giglio Sabrina, Magi Alberto
Medical Genetics Unit, Meyer Children's University Hospital, Florence, Italy.
Department of Experimental and Clinical Medicine, University of Florence, Viale Pieraccini 6, Florence, 50139, Italy.
BMC Bioinformatics. 2017 Jun 28;18(1):321. doi: 10.1186/s12859-017-1734-5.
The identification of copy number variants (CNVs) is essential to study human genetic variation and to understand the genetic basis of mendelian disorders and cancers. At present, genome-wide detection of CNVs can be achieved using microarray or second generation sequencing (SGS) data. Although these technologies are very different, the genomic profiles that they generate are mathematically very similar and consist of noisy signals in which a decrease or increase of consecutive data represent deletions or duplication of DNA. In this framework, the most important step of the analysis consists of segmenting genomic profiles for the identification of the boundaries of genomic regions with increased or decreased signal.
Here we introduce SLMSuite, a collection of algorithms, based on shifting level models (SLM), to segment genomic profiles from array and SGS experiments. The SLM algorithms take as input the log-transformed genomic profiles from SGS or microarray experiments and output segmentation results. We apply our method to the analysis of synthetic genomic profiles and real whole genome sequencing data and we demonstrate that it outperforms the state of the art circular binary segmentation algorithm in terms of sensitivity, specificity and computational speed.
The SLMSuite contains an R library with the segmentation methods and three wrappers that allow to use them in Python, Ruby and C++. SLMSuite is freely available at https://sourceforge.net/projects/slmsuite .
拷贝数变异(CNV)的识别对于研究人类遗传变异以及理解孟德尔疾病和癌症的遗传基础至关重要。目前,可使用微阵列或第二代测序(SGS)数据实现全基因组范围内的CNV检测。尽管这些技术差异很大,但它们生成的基因组图谱在数学上非常相似,且由噪声信号组成,其中连续数据的减少或增加代表DNA的缺失或重复。在此框架下,分析的最重要步骤包括对基因组图谱进行分段,以识别信号增加或减少的基因组区域的边界。
在此,我们引入了SLMSuite,这是一组基于移动水平模型(SLM)的算法,用于对来自阵列和SGS实验的基因组图谱进行分段。SLM算法将来自SGS或微阵列实验的对数转换后的基因组图谱作为输入,并输出分段结果。我们将我们的方法应用于合成基因组图谱和真实全基因组测序数据的分析,并证明在灵敏度、特异性和计算速度方面,它优于当前最先进的圆形二分法分割算法。
SLMSuite包含一个带有分段方法的R库以及三个包装器,可在Python、Ruby和C++中使用这些方法。SLMSuite可在https://sourceforge.net/projects/slmsuite上免费获取。