Rudd K E, Miller W, Werner C, Ostell J, Tolstoshev C, Satterfield S G
Laboratory of Bacterial Toxins, Food and Drug Administration, Bethesda, MD 20892.
Nucleic Acids Res. 1991 Feb 11;19(3):637-47. doi: 10.1093/nar/19.3.637.
Methods are presented for organizing and integrating DNA sequence data, restriction maps, and genetic maps for the same organism but from a variety of sources (databases, publications, personal communications). Proper software tools are essential for successful organization of such diverse data into an ordered, cohesive body of information, and a suite of novel software to support this endeavor is described. Though these tools automate much of the task, a variety of strategies is needed to cope with recalcitrant cases. We describe such strategies and illustrate their application with numerous examples. These strategies have allowed us to order, analyze, and display over one megabase of E. coli DNA sequence information. The integration task often exposes inconsistencies in the available data, perhaps caused by strain polymorphisms or human oversight, necessitating the application of sound biological judgment. The examples illustrate both the level of expertise required of the database curator and the knowledge gained as apparent inconsistencies are resolved. The software and mapping methods are applicable to the study of any genome for which a high resolution restriction map is available. They were developed to support a weakly coordinated sequencing effort involving many laboratories, but would also be useful for highly orchestrated sequencing projects.
本文介绍了一些方法,用于整理和整合来自各种来源(数据库、出版物、个人交流)的同一生物体的DNA序列数据、限制性图谱和遗传图谱。合适的软件工具对于成功地将这些多样的数据组织成一个有序、连贯的信息主体至关重要,本文还描述了一套支持这一工作的新型软件。尽管这些工具能使大部分任务自动化,但仍需要各种策略来处理棘手的情况。我们描述了这些策略,并通过大量实例说明其应用。这些策略使我们能够对超过1兆碱基的大肠杆菌DNA序列信息进行排序、分析和展示。整合任务常常会揭示现有数据中的不一致性,这可能是由菌株多态性或人为疏忽造成的,因此需要运用合理的生物学判断。这些实例既说明了数据库管理者所需的专业水平,也展示了在解决明显的不一致性时所获得的知识。该软件和图谱绘制方法适用于任何有高分辨率限制性图谱的基因组研究。它们是为支持涉及多个实验室的协调不力的测序工作而开发的,但对于精心安排的测序项目也会很有用。