Center for Biomedical informatics, Harvard Medical School, Boston, MA 02115, USA.
BMC Med Genomics. 2010 Oct 29;3:50. doi: 10.1186/1755-8794-3-50.
BACKGROUND: Disease-specific genetic information has been increasing at rapid rates as a consequence of recent improvements and massive cost reductions in sequencing technologies. Numerous systems designed to capture and organize this mounting sea of genetic data have emerged, but these resources differ dramatically in their disease coverage and genetic depth. With few exceptions, researchers must manually search a variety of sites to assemble a complete set of genetic evidence for a particular disease of interest, a process that is both time-consuming and error-prone. METHODS: We designed a real-time aggregation tool that provides both comprehensive coverage and reliable gene-to-disease rankings for any disease. Our tool, called Genotator, automatically integrates data from 11 externally accessible clinical genetics resources and uses these data in a straightforward formula to rank genes in order of disease relevance. We tested the accuracy of coverage of Genotator in three separate diseases for which there exist specialty curated databases, Autism Spectrum Disorder, Parkinson's Disease, and Alzheimer Disease. Genotator is freely available at http://genotator.hms.harvard.edu. RESULTS: Genotator demonstrated that most of the 11 selected databases contain unique information about the genetic composition of disease, with 2514 genes found in only one of the 11 databases. These findings confirm that the integration of these databases provides a more complete picture than would be possible from any one database alone. Genotator successfully identified at least 75% of the top ranked genes for all three of our use cases, including a 90% concordance with the top 40 ranked candidates for Alzheimer Disease. CONCLUSIONS: As a meta-query engine, Genotator provides high coverage of both historical genetic research as well as recent advances in the genetic understanding of specific diseases. As such, Genotator provides a real-time aggregation of ranked data that remains current with the pace of research in the disease fields. Genotator's algorithm appropriately transforms query terms to match the input requirements of each targeted databases and accurately resolves named synonyms to ensure full coverage of the genetic results with official nomenclature. Genotator generates an excel-style output that is consistent across disease queries and readily importable to other applications.
背景:由于测序技术的近期改进和成本大幅降低,特定于疾病的遗传信息呈快速增长态势。已经出现了许多旨在捕获和组织这些堆积如山的遗传数据的系统,但这些资源在疾病覆盖范围和遗传深度方面差异很大。除了少数例外,研究人员必须手动搜索各种站点,以收集特定疾病的完整遗传证据,这个过程既耗时又容易出错。
方法:我们设计了一个实时聚合工具,可为任何疾病提供全面的覆盖范围和可靠的基因与疾病的排名。我们的工具称为 Genotator,它自动整合了 11 个可访问的临床遗传学资源的数据,并使用这些数据以一种简单的公式对基因进行疾病相关性排序。我们在三个独立的疾病中测试了 Genotator 的覆盖准确性,这些疾病都存在专门的 curated 数据库,即自闭症谱系障碍、帕金森病和阿尔茨海默病。Genotator 可在 http://genotator.hms.harvard.edu 免费获取。
结果:Genotator 表明,11 个选定数据库中的大多数都包含有关疾病遗传构成的独特信息,其中 2514 个基因仅存在于 11 个数据库中的一个数据库中。这些发现证实,整合这些数据库提供的信息比仅从一个数据库获得的信息更全面。Genotator 成功地为我们的所有三个用例确定了至少 75%的排名最高的基因,包括与阿尔茨海默病排名前 40 的候选基因的 90%一致性。
结论:作为一个元查询引擎,Genotator 为历史遗传研究以及特定疾病遗传理解的最新进展提供了高覆盖率。因此,Genotator 提供了实时聚合的排名数据,这些数据与疾病领域的研究步伐保持一致。Genotator 的算法适当转换查询词以匹配每个目标数据库的输入要求,并准确解析命名同义词,以确保使用官方命名法全面覆盖遗传结果。Genotator 生成的 excel 风格输出在疾病查询之间保持一致,并且可以轻松导入到其他应用程序中。
BMC Med Genomics. 2010-10-29
Genome Res. 1997-7
J Struct Funct Genomics. 2012-6
Database (Oxford). 2013-5-9
Bioinformatics. 2015-4-1
BMC Bioinformatics. 2011-10-27
Nucleic Acids Res. 2013-11-21
Mol Biotechnol. 2000-11
BMC Bioinformatics. 2018-12-28
Neurol Genet. 2018-3-21
Curr Protoc Chem Biol. 2012-9
Curr Opin Neurol. 2010-8
Dis Model Mech. 2010
Am J Med Genet B Neuropsychiatr Genet. 2010-7
Methods Mol Biol. 2010
Neurobiol Aging. 2009-10-29
Nucleic Acids Res. 2009-10-20
Alzheimer Dis Assoc Disord. 2009