Second Design 02

The Langbehn Database, Again

Let’s look at The Langbehn Database from a data design perspective.

The Langehn Database is represented in a collection of tables defined by TNG (The Next Generation of Genealogy Sitebuilding). The TNG product site is here: http://lythgoes.net/genealogy/software.php. The software itself is well supported with an extremely active mailing list.

As part of our data design, I intend to add a link to each TNG-generated page showing our other resources related to that person. The information could also be part of a tooltip popup. The implementation details are outside this data design; I’ve done this sort of thing with TNG before.

What we DO need with this data design, is provision for an SQL query providing that information, given the relevant Dwight reference number.

We also need to design the reverse direction. For a given Dwight reference/classification number, we need to be able to provide a Web link to the page in The Langbehn Database (as displayed by TNG).

Many people have the same name. Our published genealogies contain a LOT of different persons all named John Strong! We therefore need to extract from the Langbehn database distinguishing information such as:

  • Birth/Marriage/Death dates where known
  • Spouse name(s) where known
  • Descendant line. We need to indirectly derive this; see below
  • The Web link (URL) to this person’s page in the TNG display area

Note that as Langbehn Database updates are received, this information gets replaced and re-imported.

The import process is outside of this data design. This data design needs to provide for storing (and deriving) the above information, and allow for replacement of the information by re-import.

The descendant line means which child of Elder John Strong the person descends from. That’s an important classification from a revenue standpoint, because we sell material according to descendant line.

By scanning Dwight’s History, I can make a list of which number ranges attach to which descendant line. There is an accidental overlap of about a hundred numbers, due to an error on Dwight’s part. We likely don’t have additional information for any persons in that overlapping range, so can arbitrarily assign that range to one of those two lines.

We can therefore construct a table which maps descendant line to number ranges. It’s worthwhile having a separate table of Descendant Line definitions. This can include abbreviations, descriptive text. It should not contain the Dwight ranges because there are multiple ranges per descendant line.

For example, the Sarah line includes #27-#28 (Sarah Strong and husband Joseph Barnard), and #27449-#27542 (their descendants). My own classification number, #27454, fits in that range.

Thus one requirement is: Given a Dwight classification number, determine the Descendant Line.