The Printed Volumes
This project is not about the already-published books. However, there is a significant revenue opportunity which can be supported by a simple piece of the data design.
Our artifact is the Table of Contents of each book. Each Table of Contents is 1-3 pages long and easily transcribed into a spreadsheet (for example). Here is an excerpt from Volume Three:
THE JOHN STRONG, JR. LINE (below is Dwight #, the person, and page number in this book):
- #3, John Strong, Jr., Page 1
- #39, Hannah Strong, Page 3
- #41, John Strong, Page 12
- …
THE ABIGAIL STRONG CHAUNCEY LINE
- #15, Abigail Strong, Page 199
- #23954, Abigail Brewer, Page 199
- #24042, Nathaniel Chauncey, III, Page 215
and so on.
Read more…
Categories:
Data Design Tags:
The Langbehn Database, Again
Let’s look at The Langbehn Database from a data design perspective.
The Langehn Database is represented in a collection of tables defined by TNG (The Next Generation of Genealogy Sitebuilding). The TNG product site is here: http://lythgoes.net/genealogy/software.php. The software itself is well supported with an extremely active mailing list.
As part of our data design, I intend to add a link to each TNG-generated page showing our other resources related to that person. The information could also be part of a tooltip popup. The implementation details are outside this data design; I’ve done this sort of thing with TNG before.
What we DO need with this data design, is provision for an SQL query providing that information, given the relevant Dwight reference number.
Read more…
Categories:
Data Design Tags:
There is a second area which might be a good place for formal Data Design. This area is not time critical, and is separate from the paper document solutions being discussed elsewhere. This design has the aim of revenue generation.
I will describe the specific artifacts which currently exist, and work towards the end user experience. Additional background material is at: http://otscripts.com/category/data-design/.
Read more…
Categories:
Data Design Tags:
I have been thinking in terms of a home-grown solution. I now find (via private correspondence) that a comprehensive paper-document-management solution may exist.
Where does that leave us? I’ll lay out some thoughts for discussion.
Read more…
Categories:
Data Design Tags:
The intention of this project is to make use of the Historian Archives. I’m taking what I think is a Business Intelligence approach. I’m trying to produce results that are useful to people from the mountain of information. The ideal would be to publish a dozen new volumes of genealogy, but that’s just not practical.
As the first step, we connect the Seven-Volume Index and the Langbehn Database. That is a great outcome and within current capabilities. It will take some time, but it’s feasible.
As a second step, we connect all other submitted GEDCOM files. This is a huge gain. Before now, I just have not known what to do with submitted information. This is because I’ve been thinking in terms of creating a coherent book or some electronic equivalent.
Now, we’re thinking more in terms of a search engine. Browse around and make connections. I’m not sure where to take it from there, but that’s a great start.
Read more…
Categories:
Data Design Tags:
This project is intended to be of benefit to the SFAA, and it is intended to be feasible with available resources. It’s time, therefore, to see if I can articulate those benefits. I particularly need to be able to explain how we can make use of volunteer help.
So. How do we put this project in practical terms?
Read more…
Categories:
Data Design Tags:
Seven-Volume Index
What is the best way to achieve a usable result which any member of the Association can see as clearly beneficial? I think the answer is the Seven-Volume Index. An ideal result would be to connect it up to the Langbehn database, but I’m not sure there’s time for that to happen.
One key challenge is digitizing the Index itself. I have much of the original index in the form of floppy disks with a custom file format, running MS-DOS, probably on the PC/AT. There’s no way of knowing if those disks are complete, or if they represent the final copy after proofing.
I therefore think the best approach is to scan and OCR the text. The index is on 8.5×11″ (standard letter size) paper. There are 861 pages. It is printed in four columns per page. This means there may be a major challenge in converting this to a usable text file.
It seems to me, though, that the best approach is to start with the digitized index. Getting a working text file is independent of the Data Design, and can therefore be started right away.
Read more…
Categories:
Data Design Tags:
I view our project (the Web site) as having two distinct phases:
- We have the User Experience. I believe this is what we want to design first.
- We have the import process. The data creation/import process is undoubtedly different for each type of artifact or collection.
Project Design 01 talked about the first item. This post talks about the second item.
Read more…
Categories:
Data Design Tags:
Project Constraints
The solution is dictated to be MySQL with the CakePHP Framework for PHP. My self study is aimed in that direction. I am assuming this won’t create a barrier to implementing the data design in Oracle at the same time.
CakePHP has some slightly weird table-naming expectations. I have found that if you follow the CakePHP expectations to the letter, you can get things up and running in extremely rapid fashion. If not, you don’t. I am therefore dictating that the actual MySQL implementations follow the naming conventions documented at http://book.cakephp.org/2.0/en/models/associations-linking-models-together.html.
CakePHP does have a good way of building a solid Web site based on highly normalized tables. CakePHP thrives on a correctly-normalized design. It’s very convenient, given the curriculum sequence, that I intend to create the data design first and build the Web site out from the schema.
I would ideally like to have some sort of useful demonstration up and running for the Strong Family Reunion.
This means that I need to find a way to articulate what I’m trying to accomplish, and figure out what this means for people willing to add their help. These would be genealogy people not hi-tech computer people.
Read more…
Categories:
Data Design Tags:
The Project
Can we boil all of this background down to a reasonable project? I have hopes that we can.
I envision the following primary ways of looking at the Historian Archives:
- Browse and search our various online genealogy databases. We receive a file in GEDCOM format and import it to TNG for Web display.
- For a given Dwight number, browse a list of collections tied to that Dwight number. You can click any item in the list to get to that collection. The Dwight number is meaningless of itself; it is in effect a surrogate key. All Dwight-based links need to have suitable descriptive text.
- For a given name, browse a list of resources tied to that name. This may be tricky to display in useful and coherent fashion. This may be a good place to separate presentation from content.
- For a given descendant line, browse all resources tied to that descendant line.
- Each display page should likely contain links to each of the above, tuned to that specific page.
- We have a table of contents page providing links to each collection.
- We provide a means (probably an XML site map) for search engines to correctly index the site(s).
Read more…
Categories:
Data Design Tags: