pggptitle.gif (16097 bytes) 33277.jpg (4160 bytes)
This project is collaborated between The Forsyth Institute (TFI) and The Institute for Genomic Research (TIGR), and is funded by National Institute of Dental and Craniofacial Research (NIDCR)

Gap closure, editing, and sequence completion

The complete genome sequence is obtained by sequencing across the gaps between contigs. While gap filling has occupied a major portion of the time and expense of other genome sequencing projects, it is expected to be minimal in the proposed project. This is primarily due to:1) use of the relatively high (~8.0X) coverage in the random phase of the project; 2) the use of a large insert (15-23 kb) libraries in addition to the small insert library; 3) long sequence read lengths; and 4) use of sequence from both ends of all clones.

Sequence gaps are those regions for which a template is available (ie. the unsequenced region in the middle of a clone) will be closed by designing primers pointing outward from the ends of contigs (toward the center of the unread segment) and performing DyeTerminator sequencing reactions on the appropriate template. Physical gaps are those for which no template is available (ie. no clones identified which cross the gap) and result in contigs whose order with respect to one another is unknown. If the libraries are truly random, then there should be no physical gaps in the sequence, only sequencing gaps. However, if due to non-random selection there are physical gaps, then they can be closed using the methods summarized below.

Table 6. Gap Closure Strategies for Whole Genome Sequencing Projects

Contig Ordering


Clone Links

Forward and reverse ends of clones(2 kb and 18 kb libraries)

Split peptides

Protein matches at ends of potentially adjacent contigs

Southern blots

Oligonucleotide fingerprint comparisons


Combinatorial reactions with all contig end-oligonucleotides

Sequence Gaps (DNA template available)


Dye terminator reactions

Primer walking to cover both strands of the gap

Physical Gaps (No DNA template available)



Regular or long-range PCR products sequenced directly

Large insert libraries

Clones isolated from large insert libraries sequenced directly

Following completion of the assembly process the sequence data is edited using the TIGR Editor (developed as a collaborative effort between TIGR and the Applied Biosystems Division of Perkin Elmer) which can download contigs from the database and thus provide a graphical interface to the electropherogram for the purpose of editing data associated with the aligned sequence file output of TIGR Assembler. In addition to editing on the basis of electropherogram data, frame shifts identified during annotation of the genome sequence data are used to indicate areas of the sequence that might require further editing or sequencing. TIGR Editor has been used as the primary sequence viewing and editing tool for the H. influenzae and M. genitalium projects.

This page is created and maintained by Drs. Margaret Duncan, Floyd Dewhirst, and Tsute Chen, Department of Molecular Genetics, The Forsyth Institute .

Last modified on 02/20/2002

Copyright 2000, 2001, 2002 by The Forsyth Institute