Asgari Yazdan, Sugier Pierre-Emmanuel, Baghfalaki Taban, Lucotte Elise, Karimi Mojgan, Sedki Mohammed, Ngo Amélie, Liquet Benoit, Truong Thérèse
Paris-Saclay University, UVSQ, Gustave Roussy, Inserm, CESP, Team Exposome and Heredity, 94807 Villejuif, France.
Laboratoire de Mathématiques et de leurs Applications de Pau, Université de Pau et des Pays de l'Adour, UMR CNRS 5142, E2S-UPPA, 64000 Pau, France.
NAR Genom Bioinform. 2023 Jul 5;5(3):lqad065. doi: 10.1093/nargab/lqad065. eCollection 2023 Sep.
Cross-phenotype association using gene-set analysis can help to detect pleiotropic genes and inform about common mechanisms between diseases. Although there are an increasing number of statistical methods for exploring pleiotropy, there is a lack of proper pipelines to apply gene-set analysis in this context and using genome-scale data in a reasonable running time. We designed a user-friendly pipeline to perform cross-phenotype gene-set analysis between two traits using GCPBayes, a method developed by our team. All analyses could be performed automatically by calling for different scripts in a simple way (using a Shiny app, Bash or R script). A Shiny application was also developed to create different plots to visualize outputs from GCPBayes. Finally, a comprehensive and step-by-step tutorial on how to use the pipeline is provided in our group's GitHub page. We illustrated the application on publicly available GWAS (genome-wide association studies) summary statistics data to identify breast cancer and ovarian cancer susceptibility genes. We have shown that the GCPBayes pipeline could extract pleiotropic genes previously mentioned in the literature, while it also provided new pleiotropic genes and regions that are worthwhile for further investigation. We have also provided some recommendations about parameter selection for decreasing computational time of GCPBayes on genome-scale data.
使用基因集分析的跨表型关联有助于检测多效性基因并揭示疾病之间的共同机制。尽管探索多效性的统计方法越来越多,但在这种情况下,缺乏适当的流程来应用基因集分析并在合理的运行时间内使用基因组规模的数据。我们设计了一个用户友好的流程,使用我们团队开发的方法GCPBayes在两个性状之间进行跨表型基因集分析。所有分析都可以通过简单地调用不同的脚本(使用Shiny应用程序、Bash或R脚本)自动执行。还开发了一个Shiny应用程序来创建不同的图表,以可视化GCPBayes的输出。最后,我们小组的GitHub页面上提供了一个关于如何使用该流程的全面且循序渐进的教程。我们在公开可用的全基因组关联研究(GWAS)汇总统计数据上展示了该应用,以识别乳腺癌和卵巢癌易感基因。我们已经表明,GCPBayes流程可以提取文献中先前提到的多效性基因,同时还提供了值得进一步研究的新的多效性基因和区域。我们还提供了一些关于参数选择的建议,以减少GCPBayes在基因组规模数据上的计算时间。