Project Design 03

Seven-Volume Index

What is the best way to achieve a usable result which any member of the Association can see as clearly beneficial? I think the answer is the Seven-Volume Index. An ideal result would be to connect it up to the Langbehn database, but I’m not sure there’s time for that to happen.

One key challenge is digitizing the Index itself. I have much of the original index in the form of floppy disks with a custom file format, running MS-DOS, probably on the PC/AT. There’s no way of knowing if those disks are complete, or if they represent the final copy after proofing.

I therefore think the best approach is to scan and OCR the text. The index is on 8.5×11″ (standard letter size) paper. There are 861 pages. It is printed in four columns per page. This means there may be a major challenge in converting this to a usable text file.

It seems to me, though, that the best approach is to start with the digitized index. Getting a working text file is independent of the Data Design, and can therefore be started right away.

Looking at the index, I thought of another useful way to search. If we identify a name, it might be VERY useful to find out what other names are on that same page. The surrounding names might let us know that this is definitely the person being sought, and therefore which printed volume to purchase.

Here is the information we gain with the index:

  • The exact page where that name appears
  • Which descendant line the person is in
  • For women, whether this is the married or maiden name
  • All other names on that same page. For women, that list should include both maiden and married name
  • The Dwight number for that person. This is not so useful now, but will be tremendously useful when we connect up the Langbehn database, which has all the connections of all people in the Dwight volumes. We can only infer the Dwight number if the person is in one of the Updates volumes, but we also know the descendant line, which can help

Preliminary Prototype

Assuming that the User Experience data design is complete, it might be quite feasible to “seed” the database with a couple of dozen entries from the Seven-Volume Index, and links to a few of the Langbehn Database. This should include some connections between the Seven-Volume Index, and (via Dwight number) the relevent person in the Langbehn Database.

This allows us a demonstration of the concept, and takes us to the next post.