Podicheti Ram, Mockaitis Keithanne
Center for Genomics and Bioinformatics, Indiana University, 1001 E. Third Street, Bloomington, IN 47405, USA; School of Informatics and Computing, Indiana University, 919 E. Tenth Street, Bloomington, IN 47408, USA.
Pervasive Technology Institute, Indiana University, 2709 E. Tenth Street, Bloomington, IN 47408, USA; Department of Biology, Indiana University, 915 E. Third Street, Bloomington, IN 47405, USA.
Methods. 2015 Jun;79-80:11-7. doi: 10.1016/j.ymeth.2015.04.028. Epub 2015 Apr 29.
As approaches are sought for more efficient and democratized uses of non-model and expanded model genomics references, ease of integration of genomic feature datasets is especially desirable in multidisciplinary research communities. Valuable conclusions are often missed or slowed when researchers refer experimental results to a single reference sequence that lacks integrated pan-genomic and multi-experiment data in accessible formats. Association of genomic positional information, such as results from an expansive variety of next-generation sequencing experiments, with annotated reference features such as genes or predicted protein binding sites, provides the context essential for conclusions and ongoing research. When the experimental system includes polymorphic genomic inputs, rapid calculation of gene structural and protein translational effects of sequence variation from the reference can be invaluable. Here we present FEATnotator, a lightweight, fast and easy to use open source software program that integrates and reports overlap and proximity in genomic information from any user-defined datasets including those from next generation sequencing applications. We illustrate use of the tool by summarizing whole genome sequence variation of a widely used natural isolate of Arabidopsis thaliana in the context of gene models of the reference accession. Previous discovery of a protein coding deletion influencing root development is replicated rapidly. Appropriate even in investigations of a single gene or genic regions such as QTL, comprehensive reports provided by FEATnotator better prepare researchers for interpretation of their experimental results. The tool is available for download at http://featnotator.sourceforge.net.
随着人们寻求更高效、更民主地使用非模式和扩展模式基因组学参考资料的方法,在多学科研究群体中,基因组特征数据集的易于整合尤为重要。当研究人员将实验结果参考至一个缺乏可获取格式的整合泛基因组和多实验数据的单一参考序列时,往往会错过或延缓有价值的结论。将基因组位置信息(如来自各种新一代测序实验的结果)与注释参考特征(如基因或预测的蛋白质结合位点)相关联,为得出结论和进行后续研究提供了至关重要的背景信息。当实验系统包含多态基因组输入时,从参考序列快速计算序列变异对基因结构和蛋白质翻译的影响可能非常有价值。在此,我们介绍FEATnotator,这是一个轻量级、快速且易于使用的开源软件程序,它整合并报告来自任何用户定义数据集(包括来自新一代测序应用的数据集)的基因组信息中的重叠和邻近情况。我们通过在参考登录号的基因模型背景下总结广泛使用的拟南芥天然分离株的全基因组序列变异,来说明该工具的使用方法。先前发现的一个影响根系发育的蛋白质编码缺失迅速得到了验证。即使在对单个基因或基因区域(如QTL)的研究中,FEATnotator提供的全面报告也能让研究人员更好地准备解释他们的实验结果。该工具可从http://featnotator.sourceforge.net下载。