Suppr超能文献

PeakCNV:一种基于多特征排序算法的全基因组拷贝数变异关联研究工具。

PeakCNV: A multi-feature ranking algorithm-based tool for genome-wide copy number variation-association study.

作者信息

Labani Mahdieh, Afrasiabi Ali, Beheshti Amin, Lovell Nigel H, Alinejad-Rokny Hamid

机构信息

BioMedical Machine Learning Lab (BML), The Graduate School of Biomedical Engineering, UNSW Sydney, Sydney, NSW 2052, Australia.

Data Analytics Lab, School of Computing, Macquarie University, Sydney, NSW 2109, Australia.

出版信息

Comput Struct Biotechnol J. 2022 Sep 7;20:4975-4983. doi: 10.1016/j.csbj.2022.09.001. eCollection 2022.

Abstract

Copy Number Variation (CNV) refers to a type of structural genomic alteration in which a segment of chromosome is duplicated or deleted. To date, many CNVs have been identified as causative genetic elements for several diseases and phenotypes. However, performing a CNV-based genome-wide association study is challenging due to inconsistency in length and occurrence of CNVs across different individuals under investigation. One of the most efficient strategies to address this issue is building CNV regions (genomic regions in which CNVs are overlapping - CNVRs). However, this approach is susceptible to a high false positive rate due to overlapping and co-occurring of confounding CNVRs with true positive CNVRs. Here, we develop PeakCNV that differentiates false-positive CNVRs from true positives by calculating a new metric, independence ranking score, (IR-score) via a feature ranking approach. We compared the performance of PeakCNV with other current existing tools by carrying out two case studies one using the CNV genotype data for individuals with prostate cancer (194 cases and 2,392 healthy individuals) and the second one for individuals with neurodevelopmental disorders (19,642 cases and 6,451 healthy individuals). Crucially, our benchmarking analyses on prostate cancer cohort indicated that PeakCNV identifies a fewer risk candidate CNVRs with shorter lengths compared to other tools. Importantly, these CNVRs cover a greater proportion of case over healthy individuals compared to other tools. The accuracy of PeakCNV in identifying relevant candidate CNVRs was reproducible in the case study on neurodevelopmental disorders. Using data from the FANTOM5 expression atlas and the Clinical Genomic Database, we show that the candidate CNVRs identified by PeakCNV for neurodevelopmental disorders overlap with a greater number of genes with the brain-enriched expression, and a greater number of genes that are associated with neurological conditions compared to candidate CNVRs identified by other tools. Taken together, PeakCNV outperformed current existing CNV association study tools by identifying more biologically meaningful CNVRs relevant to the phenotype of interest. PeakCNV is publicly available for the analysis of CNV-associated diseases and is accessible from https://rdrr.io/github/mahdieh1/PeakCNV.

摘要

拷贝数变异(CNV)是指一种基因组结构改变,其中染色体的一段被复制或删除。迄今为止,许多CNV已被确定为多种疾病和表型的致病遗传因素。然而,由于在不同受调查个体中CNV的长度和出现情况不一致,进行基于CNV的全基因组关联研究具有挑战性。解决这个问题的最有效策略之一是构建CNV区域(CNV重叠的基因组区域 - CNVR)。然而,由于混淆的CNVR与真正阳性的CNVR重叠和同时出现,这种方法容易出现高假阳性率。在这里,我们开发了PeakCNV,通过特征排名方法计算一个新的指标,独立排名分数(IR分数),将假阳性CNVR与真阳性区分开来。我们通过进行两个案例研究来比较PeakCNV与其他现有工具的性能,一个使用前列腺癌个体的CNV基因型数据(194例病例和2392名健康个体),另一个使用神经发育障碍个体的CNV基因型数据(19642例病例和6451名健康个体)。至关重要的是,我们对前列腺癌队列的基准分析表明,与其他工具相比,PeakCNV识别出的风险候选CNVR数量更少,长度更短。重要的是,与其他工具相比,这些CNVR在病例中覆盖的比例高于健康个体。PeakCNV在识别相关候选CNVR方面的准确性在神经发育障碍的案例研究中是可重复的。使用来自FANTOM5表达图谱和临床基因组数据库的数据,我们表明,与其他工具识别的候选CNVR相比,PeakCNV为神经发育障碍识别的候选CNVR与更多具有大脑富集表达的基因以及更多与神经系统疾病相关的基因重叠。综上所述,PeakCNV通过识别更多与感兴趣表型相关的具有生物学意义的CNVR,优于现有的CNV关联研究工具。PeakCNV可公开用于分析与CNV相关的疾病,可从https://rdrr.io/github/mahdieh1/PeakCNV获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fec0/9478359/42b671e70cb9/gr1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验