Second Design 04

The Manuscripts in Progress

The SFAA Historians worked on updates to our published books, from 1990-2004. I have most of this work in the form of Microsoft Word documents. These documents are organized the same way as the published books. That is, each separate Word document covers descendants of a specific person. Each can be tied to one specific Dwight number and person.

Because these are computer files, I believe that I can mine each document for names. I believe I can catch the name at the beginning of each formatted paragraph, and capture an excerpt such as the paragraph itself.

Since each of these documents was typed by hand, there is a lot of “fuzzy logic” involved in successfully capturing the names. Thus this information would be added to the database slowly, over time.

However, you can probably see how this now fits into the scheme of things. As with the table of contents of the published book, each document can be registered with a descendant line, Dwight number and name, and file name/location.

When I mine the document for names, these can become part of a searchable master names list. I expect to capture an excerpt of surrounding text, so that the name has some sort of context useful to the end user / researcher.

This provides us with two revenue opportunities.

  • First, the names list with text excerpts may allow a person to realize that they have found the right place. They can figure out exactly which volumes of genealogy they’d like to purchase from the SFAA.
  • Second, we can collect and publish (on CD) all of the Manuscripts in Progress for a given descendant line. Thanks to this data design I might be able to generate a coherent Table of Contents for the CD, giving Dwight Number, Name, and file name/link, just like the printed books’ tables of contents.

The ideal solution would be to turn these Manuscripts in Progress into coherent single genealogies. A quarter century later, we haven’t yet managed this. This, at least, gets the material out to the persons most interested in that material.

The Seven-Volume Index

I do have a (printed on paper) index of all names of all seven printed volumes, including married names. It may be that, once commercially scanned, I can reprocess the information to create a usable names index. Each entry contains descendant line, volume and page number where the name appears. From the page number, in most cases, we can infer the relevant Dwight number.

So, if the digitization ever comes to pass, this presents a revenue opportunity. We can publish a complete index of everything in our seven published volumes.

We currently have scans of this index, one PDF file per letter of the alphabet. It might actually be worthwhile to put those online and make them public. People know how to use an index, and might find what information they can purchase as a book.

If we create a proper database from the Seven-Volume Index (which is far in the future), this would also allow us to show all names for a given page. Our researcher finds the right name, but doesn’t know if it’s the right person of that name. Our researcher could then look at the list of ALL names on that page. If he or she recognizes several of those names, our researcher will know it’s the right page, and be inclined to purchase the book.

The Walther Barnard Manuscript

This set of Microsoft Word documents would print out to several thousand pages. It covers the Sarah (Strong) Barnard line. It’s so large that it gets its own CD.

As with the other Manuscripts in Progress, I believe it’s feasible to mine the documents for primary names, with excerpt text to give context and distinguishing information.

I don’t think it requires anything separate in the way of data design. It will require separate mining software.