Mining the Historian Archives 03

Genealogy Databases

I personally have a number of genealogy databases, in actual database form:

  1. The John Langbehn database. John, for many years, has been typing the contents of Dwight’s History of the Strong Family into his Family Tree Maker program, a Windows-based genealogy program. Each person, in this Langbehn database, has the Dwight number typed into the “User Reference Number” field.
  2. I have previous editions of the same database from John Langbehn.
  3. I have two versions of my own research database.
  4. I have other databases contributed from related persons.

I have commercial software (TNG, The Next Generation of Genealogy Site Building) capable of importing each of the above databases into its MySQL database and display the information as web pages.

Since TNG is capable of maintaining each family tree database separately, there is no specific need to merge databases. You will recall that I have concluded it is ALWAYS a bad idea to merge groups of genealogy data. The result is a tangled mess. That in fact is why I have two versions of my own research database: I made a mess of the first to the point that I started over from scratch.

At the same time, there is a strong need to merge equivalent records. You don’t want to search for the same person six different places. This, of course, is a problem that genealogists have been addressing for decades with computers, and for centuries in paper form.

Universal Classification System

I still have not described the different types of information. But, generally, every piece of information can be classified in relationship to a Dwight number. It’s not difficult to invent a separate numbering system for items which don’t get a Dwight number – either because the person or persons involved are known to NOT descend from Elder John Strong, or because we don’t know the actual connection (if any) to Elder John Strong.

In addition to the Dwight classification, each item naturally needs its own unique identification. But consider a document containing a hundred names and their genealogical data. We might catalog the document, but we might also catalog each name within the document. I’ll get to that aspect later.

What do we use as a Universal Classification System? I don’t know. Yet!

A Concrete Example

Suppose someone submits family data in electronic form. The data are in GEDCOM format, which is the standard mechanism for Genealogical Data Interchange. Here is what I envision us doing.

  1. Import the data to TNG. This allows people to browse and search the data online.
  2. The file overall probably relates to one specific Dwight number, and therefore all persons in the file can receive that same Dwight classification.
  3. Each person in the file can be added to our universal list, the subject of this project.

Note that the universal relationship here is the Dwight Number. Any single Dwight number might be assigned to tens of thousands of individuals. My Dwight #27454 actually is assigned to tens of thousands of documented individuals.

What does that get us?

  1. No matter where you are in our databases, you can click to a list of “all resources related to this Dwight number.” One line item would refer to the above-imported database as a whole.
  2. No matter where you are in our databases, you can click to a list of “all resources attached to this specific name.”

Data Relationships

If you are doing some sort of genealogical research, you’re going to land on something related to a specific person. You are going to want to know that person’s spouse, siblings, children, parents, and so on. We are NOT explicitly addressing those relationships as part of this project. We are, indirectly, which I’ll explain below.

If you are doing research related to a specific person or family group, what we CAN do is tell you all of the resources we have related to that person or family group. This falls in the area of Business Intelligence. This point is the intended value created with this project.

We may well have six different genealogy databases (based on contributed GEDCOM files) relevant to the person you are studying. We have in a sense become our own search engine.

The online GEDCOM files provide you the family relationships. We present the GEDCOM files as Web pages. You can click around to see parents, children, ancestors, and so on. If we have the information, we now make it so that you can find it.