Project Design 04

This project is intended to be of benefit to the SFAA, and it is intended to be feasible with available resources. It’s time, therefore, to see if I can articulate those benefits. I particularly need to be able to explain how we can make use of volunteer help.

So. How do we put this project in practical terms?

The ideal would be to assemble and publish updated genealogical information. Without major crowdsourcing and funding, that’s not going to happen. Let’s look at what CAN be done and be useful, and what effort is involved.

Seven-Volume Index

I’ve already described this, and I think it’s the useful starting point. It’s a one-shot deal in that it won’t ever change. Once we’ve done it, we’ve got it.

The Langbehn Database

There are several reasons why this is important.

  • John Langbehn, over the course of I don’t know how many years, has been quietly working on this out of his own interest. It would be a tremendous win to find a way to connect this up, and clearly have a direct outcome of his transcription work.
  • Everything we have is classified by Dwight number. The Langbehn Database IS the Dwight numbers. Within the Langbehn Database, you can click around, explore family lines, and so on. Connecting things to the Langbehn database would be a huge win. Anything we add, if it’s got a Dwight number, we automatically link it to the Langbehn Database.

We need to allow for the fact that I get updates from time to time as John continues to transcribe. With this database being the centerpiece, there may be reason for volunteers diving in to help with the transcription. But, folding in multiple databases can be tricky. But that’s a solvable problem.

The Langbehn Database is different from other GEDCOM files so far as our data design goes. I expect that all other GEDCOM files represent a SINGLE Dwight number, or a small set of Dwight numbers.

With all artifacts and collections (other than the Langbehn Database and the Seven-Volume Index), I expect to have a small list of Dwight numbers attached to the collection as a whole.

On the other hand, the Langbehn Database is an integral part of the design in its own right. Here, every item in the collection IS a Dwight Number and should have a special link registered.

It’s also possible that we might have multiple Langbehn Databases. That is, if we have additional transcribers, it would make sense for the additional transcribers to submit their databases directly as separate GEDCOM files rather than trying to merge with John Langbehn’s own database.

I do not intend to keep multiple versions of the same GEDCOM database online. When I get an updated GEDCOM file I intend to delete the old version and import the new version. That makes it important that the transcriber is using fixed “reference numbers”.

Hmm. Come to think of it, maybe not. If the User Experience data design allows for my deleting and replacing a collection, that means that all links would be regenerated. It really doesn’t matter if the link or foreign key changes, if we’re linking from table to table.

With the Langbehn database and friends, each item has a Dwight number, which is our point of linking. In all other GEDCOM databases, the collection as a whole has a small set of Dwight numbers, and they are the point of linking. We link to the collection/database as a whole rather than to an item in the collection.

Oops

Well, I got sidetracked again. I’m talking about design. I’ll start a fresh post from the SFAA perspective.