Westward by Sea: A Maritime Perspective on
American Expansion, 1820-1890
Digitizing the Collection
All scanning was performed by staff of Mystic Seaport, using a variety
of scanners (listed below). The most appropriate scanner was used for
each item. The objective was to capture images at a resolution that
supported reference and on-screen study, not to replace originals. In
general, printed text was captured as bitonal images while manuscript and
pictorial items were captured in greyscale or color. However, manuscript
materials scanned from bound volumes were captured as bitonal images
because of limitations of the software associated with the overhead
scanner.
The original items in this collection vary widely in size, from
nautical charts to postcards. Therefore, rather than attempting to
generate master images of a particular size, materials were scanned at a
resolution of 300 or 400 dpi, with the objective of creating a fully
readable master
image. Likewise, service GIF images were derived from the master TIFF
images by scaling down to certain percentages, rather than to
a specified spatial resolution.
Although the service images are of irregular size, the images of larger
manuscript items are much more legible than if constrained to fit on a 640 x
480 screen.
Printed text pages were scanned at 400 dpi to produce better images for
optical character recognition and then reduced to 300 dpi. Color images
were produced at 400 dpi. Large format items, such as charts, were scanned
at 300 dpi.
For service images for text and manuscript
pages, the GIF format was selected
since it is more suitable for
text than the JPEG format. Most
GIF images of handwritten material were saved in 6-bit grey rather than
8-bit to
conserve storage space without decreasing legibility. Color images were
saved as 6-bit GIFs.
To make bitonal
text images more legible, they were scaled down to
8-bit images with a high amount of dither. The standard procedure was
to convert to 8-bit grey, scale down dimensions by a certain
percentage with approximately 87% dithering, and save as a 4-bit GIF.
Objects from the
Mystic Seaport Museum's Curatorial Department were first photographed and
then scanned.
Further details on equipment and practices are provided below.
Hardware
- PCs for scanning and transcribing. Hewlett-Packard 6630 and 6640
- G3 Macintosh for OCR
- Two Minolta PS 3000 overhead scanners for books and other bound items
- One Epson Expression 836XL 11" x 17" scanner for flat material
- One Canon MS 400 microfilm scanner used for bound items that had been filmed (Contex FSC 3010
DSP for oversize items)
- Two CD burners
- Dedicated server (Hewlett-Packard NetServer LH-3)
Software
- Proprietary software for Minolta overhead scanners
- Adobe Photoshop (scanning, easier image manipulation)
- DeBabelizer by Equilibrium (easy batch capability)
- OmniPage by Caere (OCR)
- Word processing program
Archival Images
- Original scan
- Not available to general user
- TIFF format
- High dpi
- Stored temporarily on server, longer term on CDs and tapes
Service Images
- Derived from original
- Created for speed and ease of use
- GIF and JPEG formats
- Low dpi
- Stored long term on server
Scanning Specifications for Archival Images
- Words at 300 dpi, images at 400
- Capture as much color content as is necessary or possible (BW sufficient for books)
- Less concern about pixel dimensions
- Save as TIFFs
- Group IV compression for BW; LZW compression for greyscale/color
Creating Service Images
- Decrease dpi and number of colors to decrease file size and retrieval time
- GIF -- only 72 dpi, 8-bit colors max
- JPEG -- any dpi, 24-bit color or 8-bit greyscale; best for photos/illustrations
- Use imaging program with batch capability
Transcription
- For manuscript material with significant content, such as letters/journals
- Allows searchability of text
- Good for hard-to-decipher material
- If a collection is well documented through finding aids, transcription may be unnecessary
Transcription Rules
- Transcribe verbatim, retaining grammar, spelling and punctuation
- If words are undecipherable, they are noted with a triple question mark
- If confident about a word enter what we think it is, followed immediately by [?]
- Beginnings and ends of pages are noted
- Special encoding for dates, latitude and longitude coordinates are used
Optical Character Recognition (OCR)
- Is used for books and printed material with significant content
- Allows searchability of text
- Beginnings and ending of pages are noted
- Special encoding for dates and coordinates are used
File Storage
- Archival images (TIFFs) are stored temporarily on the server and then transferred to both tape
and CD (two copies, on different media)
- Service images (GIFs and JPEGs) and text files are available from the server
Return to Westward by Sea