Mining the Historian Archives 06

Twenty-Four Shelf Feet

I have a very large number of file folders. Each file folder is labeled with a name and a Dwight number. It has proven to be completely impractical to process or even inspect each item within each file folder.

However, it might be possible to process the file folders themselves. I can envision a simple data-entry mechanism such as a spreadsheet to use as a means of inventorying the file folders.

Each file folder has a Dwight number and a name. Those are our two key lookup mechanisms.

What does the file folder by itself buy us? When someone is looking at what’s available, either by name or by Dwight classification, we have the fact that a folder DOES exist with submitted information. The Historian could, upon request, go and look inside that folder to see what we have.

We could, in theory, inventory or transcribe the contents of each folder. The problem is that it’s an impossible task as things stand today. The best we can do for now is broadcast the fact that the folder exists, and broadcast that fact in the relevant context.

I see this folder inventory as a low priority. It’s too much work for too little gain. However, the data design should incorporate a means of handling the data if we produce it.

In the same way, I don’t see a need to design a way of importing this data. For example, I don’t see any need to design a spreadsheet format, design an import mechanism, etc. The folder inventory may never come to pass.

On the other hand, it’s important to allow for the possibility that we have a large number of folders with Dwight number and/or name. We’ll see why as the next item.

Submitted Information

We get submitted genealogical information in many different forms. It might come from a membership application. It could be printed pages from an ancestry.com family tree. It might be an article, a GEDCOM file, and so on.

A hundred years from now, the submitter of the information becomes a genealogical fact himself or herself. Therefore submitter and/or member information is relevant. We don’t want information on living persons to be public, and therefore we need to track whether any person is or is not living.

We can have several levels of repositories. For example, those twenty-four shelf feet of file folders are a repository. An individual file folder represents a collection of several (possibly unrelated) items. Each item should likely considered a data collection.

It unfortunately makes sense, then, to track this sort of repository information. It absolutely makes sense from a genealogical standpoint to state the precise source of information, and the precise location of that source.

There is a genealogical standard for source citation. However, that is NOT the same as keeping track of our own stuff’s location. “Ed’s basement” or “Ed’s external hard drive” are valid locations.

In fact it might make the most sense to adopt a tree structure to show location:

  • Ed’s House -> The SFAA Historian Archives (file folders) -> Folder #27454 Joseph Barnard Jr. -> Walther Barnard Correspondence
  • Ed’s Computer -> SFAA-2 Hard Drive -> SFAA Archives -> Submitted GEDCOM Files -> Kat Steckler -> Steckler.ged

From that a source citation can be derived, but that’s a point explicitly outside the scope of this project.

What, then, is within the scope?

  • For any artifact, be able to collect and store the information needed to generate a source citation. There is a standard format which I’ll need to research. We want to make this as hassle-free as possible.
  • For any artifact, be able to collect and store the location/repository location. This might be a web address; this might be hierarchical information such as above.
  • For any artifact, be able to assign a Dwight number or numbers, and/or the non-Dwight classification(s).
  • For any artifact, be able to assign a name or small list of names to the artifact as a whole.

Classification by Descendant Line

We generally divide things up by descendant line, that is, all things related to the descendants of a specific child of Elder John Strong. As we publish additional material, we intend to publish by descendant line – that is, all available information of descendants of Samuel son of Elder John Strong; descendants of Sarah daughter of Elder John Strong; and so on.

It would therefore be useful to have a list of all collections related to a specific descendant line. I can make a list of the Dwight numbers making each descendant line, and therefore for each Dwight number we can look up the descendant line. Every entry in the Seven-Volume Index, you will recall, explicitly names that person’s descendant line.

List of Collections

As an SFAA researcher, it’s likely that I would want to specifically access/search the Seven-Volume Index. There are certain other collections that I would want to specifically examine. A table of contents linking to each collection would therefore be useful.