Mining the Historian Archives 01
For over five years, I have been trying to find a way to publish information from the Historian Archives of the Strong Family Association of America (SFAA).
The SFAA published seven volumes of genealogy in the 1980s and 1990s. I have in my basement 24 shelf-feet of upright file folders. We have been collecting information for a quarter century which has never yet seen the light of day.
One effort has been to convert this information to database form. There are a number of genealogy-database programs for Macs and PCs, with a standard data interchange format. These programs can generate genealogy books from the database. In theory, then, as new information comes in it can be added to the “master database” and new versions of our books generated.
It’s been five years thus far with nothing to show for these efforts, and likely to be another 5-10 years before this approach produces anything publishable.
We have large collections of data in many different forms. In theory, all of this could be converted (i.e., entered by hand by genealogically knowledgeable human) to database form. This project is far too large to be practical for our limited amount of occasional help available.
It now occurs to me that this could be treated as a Business Intelligence matter. We can create the means to answer such questions as:
- Does the SFAA have any information on my great-grandfather David Johnston Barnard?
- Do you have any information on his descendants?
- Do you have any information on his ancestors?
- My grandfather was John Davis Barnard who married Esther Eva Agnew. Am I connected to the Strong family?
My proposed initial objective, then, is this:
- Show what information we have, if any, for persons of a given name.
I see multiple ways of arriving at that list of information:
- I have several genealogical databases online at the SFAA Web site, covering several tens of thousands of individuals. Anyone can browse through the information, clicking on links to see parent to parent, display distant relationships, and so on. People can search and explore our existing online databases to arrive at a person of interest. On each page, then, I could provide a link to our list of all resources we have related to that person, or to persons of that name.
- I can create a list of names which is searchable in the usual fashions.
- The list can be found via search engines such as Google, and perhaps by specialized genealogical search engines as well.
In other words, it’s important that a person can arrive at the list via
- known family relationships, or
- by known names, or
- by various general search techniques.
Data Integrity and Consistency
Genealogy is fraught with data inconsistencies. Here are some relevant examples:
- Numerous people have the same name. I know of several other persons named Edward Barnard.
- Two records which have been “proven” to refer to the same person or family may be later disproved.
- Any number of records “might” refer to the same person or family – or they might not.
There is a strong inclination to merge data and eliminate duplicates. That makes sense. However, in practice, this produces countless disasters.
Consider, for example, a certain George Barnard. Two of his sons have children, and as it happens both name their first son after that maternal grandfather George Barnard. Both sons, as is not uncommon, married about the same time, and with no surprise to anyone, both had their first sons the same year.
This means we have two George Barnards born the same year. They are first cousins, both having the same paternal grandfather George Barnard. So far, so good.
What we did NOT know, looking back from 150 years later, is that both cousins George Barnard each married a woman named Sarah or Sally. We find the baptismal records of various children born to George and Sarah (or Sally, a nickname for Sarah). We have no record of either mother’s maiden name.
With such a situation, it’s common to decide that all of those baptismal records are for the children of George and Sarah/Sally Barnard. We merge the two families to become one family in our database. All of the known descendants become intermingled as well.
I have, over the years, concluded that merging sets of records is ALWAYS a bad idea. Unless you have personal knowledge of the situation (e.g., all persons named Victoria born to my own parents safely refer to my one and only sister), don’t merge.
A major requirement of this project, then, is to support the concept of NOT merging.
In fact this requirement becomes an innovation. This provides us a way of making our information available.