Lee Sungyoung, Choi Sungkyoung, Qiao Dandi, Cho Michael, Silverman Edwin K, Park Taesung, Won Sungho
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea.
Department of Pharmacology, Yonsei University College of Medicine, Seoul, South Korea.
BMC Med Genomics. 2018 Apr 20;11(Suppl 2):39. doi: 10.1186/s12920-018-0345-y.
A Mendelian transmission produces phenotypic and genetic relatedness between family members, giving family-based analytical methods an important role in genetic epidemiological studies-from heritability estimations to genetic association analyses. With the advance in genotyping technologies, whole-genome sequence data can be utilized for genetic epidemiological studies, and family-based samples may become more useful for detecting de novo mutations. However, genetic analyses employing family-based samples usually suffer from the complexity of the computational/statistical algorithms, and certain types of family designs, such as incorporating data from extended families, have rarely been used.
We present a Workbench for Integrated Superfast Association studies for Related Data (WISARD) programmed in C/C++. WISARD enables the fast and a comprehensive analysis of SNP-chip and next-generation sequencing data on extended families, with applications from designing genetic studies to summarizing analysis results. In addition, WISARD can automatically be run in a fully multithreaded manner, and the integration of R software for visualization makes it more accessible to non-experts.
Comparison with existing toolsets showed that WISARD is computationally suitable for integrated analysis of related subjects, and demonstrated that WISARD outperforms existing toolsets. WISARD has also been successfully utilized to analyze the large-scale massive sequencing dataset of chronic obstructive pulmonary disease data (COPD), and we identified multiple genes associated with COPD, which demonstrates its practical value.
孟德尔遗传传递在家庭成员之间产生表型和遗传相关性,使得基于家系的分析方法在遗传流行病学研究中发挥重要作用——从遗传力估计到基因关联分析。随着基因分型技术的进步,全基因组序列数据可用于遗传流行病学研究,基于家系的样本对于检测新生突变可能变得更有用。然而,采用基于家系样本的遗传分析通常受计算/统计算法复杂性的困扰,某些类型的家系设计,如纳入大家庭数据的设计,很少被使用。
我们展示了一个用C/C++编写的相关数据综合超快速关联研究工作台(WISARD)。WISARD能够对大家庭的单核苷酸多态性芯片和下一代测序数据进行快速且全面的分析,其应用涵盖从设计遗传研究到总结分析结果。此外,WISARD可以自动以完全多线程方式运行,并且集成用于可视化的R软件使其对非专业人员更易使用。
与现有工具集的比较表明,WISARD在计算上适用于相关受试者的综合分析,并证明WISARD优于现有工具集。WISARD也已成功用于分析慢性阻塞性肺疾病数据(COPD)的大规模海量测序数据集,并且我们鉴定出多个与COPD相关的基因,这证明了其实际价值。