This is a consolidation of the “Second Design” posts into a single document. It’s the same text, just put together into a single page for convenience. If I do any editing, it will be to THIS document.
The User Experience
Our data design centers around these features:
- How/where does this item fit into the Strong genealogy? Internally we discern this with the Dwight classification number.
- From this item, we can click to the Langbehn Database, which allows us to explore the early Strong genealogy and family relationships.
- From a given point in the Langbehn Database (or anywhere else, for that matter) we can bring up links to all related resources we have online with the SFAA.
- We can create a master names list as the alternate way of exploring the above.
I don’t think we need to design in any advanced searching capability. That can come later as experience warrants. The Langbehn Database does have advanced search capability.
Other GEDCOM Database Files
If people submit genealogical information to us, they may be able to submit it in the form of GEDCOM files. All personal genealogy software supports data export to GEDCOM format.
Our TNG software can import any number of family trees (GEDCOM files), including attached images and documents if handled properly.
TNG’s advanced search capability can do searches covering a specific tree, or return results for all trees combined. Thus, rather than creating a huge “master database,” I believe it makes far more sense to maintain distinct trees as submitted. There are very strong reasons for not attempting to combine or merge information (but outside the scope of this document). TNG allows us to keep things intact as submitted, but do combined searches over all trees at once.
The Manuscripts in Progress
The SFAA Historians worked on updates to our published books, from 1990-2004. I have most of this work in the form of Microsoft Word documents. These documents are organized the same way as the published books. That is, each separate Word document covers descendants of a specific person. Each can be tied to one specific Dwight number and person.
Because these are computer files, I believe that I can mine each document for names. I believe I can catch the name at the beginning of each formatted paragraph, and capture an excerpt such as the paragraph itself.
Since each of these documents was typed by hand, there is a lot of “fuzzy logic” involved in successfully capturing the names. Thus this information would be added to the database slowly, over time.
The Printed Volumes
This project is not about the already-published books. However, there is a significant revenue opportunity which can be supported by a simple piece of the data design.
Our artifact is the Table of Contents of each book. Each Table of Contents is 1-3 pages long and easily transcribed into a spreadsheet (for example). Here is an excerpt from Volume Three:
THE JOHN STRONG, JR. LINE (below is Dwight #, the person, and page number in this book):
- #3, John Strong, Jr., Page 1
- #39, Hannah Strong, Page 3
- #41, John Strong, Page 12
THE ABIGAIL STRONG CHAUNCEY LINE
- #15, Abigail Strong, Page 199
- #23954, Abigail Brewer, Page 199
- #24042, Nathaniel Chauncey, III, Page 215
and so on.
The Langbehn Database, Again
Let’s look at The Langbehn Database from a data design perspective.
The Langehn Database is represented in a collection of tables defined by TNG (The Next Generation of Genealogy Sitebuilding). The TNG product site is here: http://lythgoes.net/genealogy/software.php. The software itself is well supported with an extremely active mailing list.
As part of our data design, I intend to add a link to each TNG-generated page showing our other resources related to that person. The information could also be part of a tooltip popup. The implementation details are outside this data design; I’ve done this sort of thing with TNG before.
What we DO need with this data design, is provision for an SQL query providing that information, given the relevant Dwight reference number.
There is a second area which might be a good place for formal Data Design. This area is not time critical, and is separate from the paper document solutions being discussed elsewhere. This design has the aim of revenue generation.
I will describe the specific artifacts which currently exist, and work towards the end user experience. Additional background material is at: http://otscripts.com/category/data-design/.
I have been thinking in terms of a home-grown solution. I now find (via private correspondence) that a comprehensive paper-document-management solution may exist.
Where does that leave us? I’ll lay out some thoughts for discussion.
The intention of this project is to make use of the Historian Archives. I’m taking what I think is a Business Intelligence approach. I’m trying to produce results that are useful to people from the mountain of information. The ideal would be to publish a dozen new volumes of genealogy, but that’s just not practical.
As the first step, we connect the Seven-Volume Index and the Langbehn Database. That is a great outcome and within current capabilities. It will take some time, but it’s feasible.
As a second step, we connect all other submitted GEDCOM files. This is a huge gain. Before now, I just have not known what to do with submitted information. This is because I’ve been thinking in terms of creating a coherent book or some electronic equivalent.
Now, we’re thinking more in terms of a search engine. Browse around and make connections. I’m not sure where to take it from there, but that’s a great start.
This project is intended to be of benefit to the SFAA, and it is intended to be feasible with available resources. It’s time, therefore, to see if I can articulate those benefits. I particularly need to be able to explain how we can make use of volunteer help.
So. How do we put this project in practical terms?