Mining the Historian Archives 04

Getting to the Specifics

The point of this project is to make information available in usable form. We’ll now look at each collection and see what I envision us doing with it.


The Seven-Volume Index

The Seven-Volume index is a list of every name in our seven published volumes (two volumes of Dwight’s History plus our five Updates produced in the 1980s and 1990s), including both maiden and married names.

An index section might look like this:

AARHUS,
 - Boyd Lee      TO 2 0477
 - Elizabeth K.  TO 2 0477
 - Jamie Alecia  TO 2 0477
AASLID,
   Svanhild B.   HA 3 0550
ABAIR,
 - Frederick I.  JO 3 0097
 - Gladys Ellen* JO 3 0097
 - Judy*         JO 3 0109

The above is formatted as follows:

  • An asterisk after the name indicates a married (as opposed to maiden) name.
  • The two-letter column indicates the Descendant Line. In the above, TO indicates Thomas son of Elder John Strong; HA indicates Hannah daughter of Elder John Strong; JO indicates John son of Elder John Strong.
  • The next column indicates the Updates volume (1-5) or Dwight (D).
  • The four digits indicate the page number of that volume. The two Dwight volumes have single continuous page numbers.

I have in my posession a low-quality scan of the Seven-Volume Index in PDF form, one file for each letter of the alphabet.

I also have in my possession the original master printed copy which was sent to the printing service. Since it was printed with an old-style computer font (the best available at the time), modern OCR programs have difficulty in correctly converting the printed paper to text.

It should, in theory, be possible to digitize this artifact. Volunteers may be available to proof and correct the result. The data fields can be readily inferred from the above.

The resulting database gives us a master name index. This tells any visitor to the Web site (note the implied requirement that this project is visible on the Web) that a person with that name is listed on a specific page of a specific published book. When there are many people with the same name, this gives no indication as to which person is the “right” person you seek.

This database does NOT give us a Dwight classification. We have a second source of information which DOES give us a Dwight classification.

The Dwight number refers to a specific person in the Dwight book. In the case of the five Updates volumes, each section of the updates shows descendants of a specific person, that is, a person with a specific Dwight number. Each book’s Table of Contents gives the Dwight number, that person’s name, and the page range in the book.

We can therefore take the table of contents (generally a single page) and create a database. The Seven-Volume Index gives us the volume and page, and that volume’s table of contents gives us the Dwight number for that page range.

We therefore have the ability to take two collections, published decades apart, and publish a master list of names connected to both their Dwight classification and to their specific listing in the published volumes.

Thanks to John Langbehn, we also have a GEDCOM file containing about 75% of those 29,000 persons in the Dwight volumes.

  • We are importing the Langbehn GEDCOM file into TNG and displaying this online in searchable and clickable form.
  • Each person has their Dwight number as part of the record, thanks to Langbehn’s countless years of transcribing to database form.
  • John Langbehn sends me updated versions of the database from time to time as he continues typing. The Dwight numbers don’t change, so as long as we use the Dwight number as access point, we can handle the updates.

This means that, for every single person named in the seven published volumes, we can provide:

  • A link to the relevant person in the Langbehn database, assuming Langbehn has typed that person in. This lets the visitor explore the family relationships.
  • The volume and page number where that person appears in our published books. This promotes book sales, a significant revenue source. By knowing which family line the named person is in, and linking (where possible) to the Langbehn database, our visitor has a good chance of figuring out whether or not this is the person or family line they seek.

I believe this much is achievable as a relatively small-scale project. The outcome is of direct benefit to the SFAA and potential interested persons. This has a direct (and positive) revenue impact. This provides information which, until now, is simply not accessible.