Second Design 03

The Printed Volumes

This project is not about the already-published books. However, there is a significant revenue opportunity which can be supported by a simple piece of the data design.

Our artifact is the Table of Contents of each book. Each Table of Contents is 1-3 pages long and easily transcribed into a spreadsheet (for example). Here is an excerpt from Volume Three:

THE JOHN STRONG, JR. LINE (below is Dwight #, the person, and page number in this book):

  • #3, John Strong, Jr., Page 1
  • #39, Hannah Strong, Page 3
  • #41, John Strong, Page 12

THE ABIGAIL STRONG CHAUNCEY LINE

  • #15, Abigail Strong, Page 199
  • #23954, Abigail Brewer, Page 199
  • #24042, Nathaniel Chauncey, III, Page 215

and so on.

Read more…

Be the first to comment - What do you think?  Posted by admin - June 29, 2014 at 12:25 pm

Categories: Data Design   Tags:

Second Design 02

The Langbehn Database, Again

Let’s look at The Langbehn Database from a data design perspective.

The Langehn Database is represented in a collection of tables defined by TNG (The Next Generation of Genealogy Sitebuilding). The TNG product site is here: http://lythgoes.net/genealogy/software.php. The software itself is well supported with an extremely active mailing list.

As part of our data design, I intend to add a link to each TNG-generated page showing our other resources related to that person. The information could also be part of a tooltip popup. The implementation details are outside this data design; I’ve done this sort of thing with TNG before.

What we DO need with this data design, is provision for an SQL query providing that information, given the relevant Dwight reference number.

Read more…

Be the first to comment - What do you think?  Posted by admin - at 12:10 pm

Categories: Data Design   Tags:

Second Design 01

There is a second area which might be a good place for formal Data Design. This area is not time critical, and is separate from the paper document solutions being discussed elsewhere. This design has the aim of revenue generation.

I will describe the specific artifacts which currently exist, and work towards the end user experience. Additional background material is at: http://otscripts.com/category/data-design/.

Read more…

Be the first to comment - What do you think?  Posted by admin - at 11:46 am

Categories: Data Design   Tags:

Complete Design Change: SFAA Perspective 02

I have been thinking in terms of a home-grown solution. I now find (via private correspondence) that a comprehensive paper-document-management solution may exist.

Where does that leave us? I’ll lay out some thoughts for discussion.

Read more…

Be the first to comment - What do you think?  Posted by admin - at 11:02 am

Categories: Data Design   Tags:

SFAA Perspective 01

The intention of this project is to make use of the Historian Archives. I’m taking what I think is a Business Intelligence approach. I’m trying to produce results that are useful to people from the mountain of information. The ideal would be to publish a dozen new volumes of genealogy, but that’s just not practical.

As the first step, we connect the Seven-Volume Index and the Langbehn Database. That is a great outcome and within current capabilities. It will take some time, but it’s feasible.

As a second step, we connect all other submitted GEDCOM files. This is a huge gain. Before now, I just have not known what to do with submitted information. This is because I’ve been thinking in terms of creating a coherent book or some electronic equivalent.

Now, we’re thinking more in terms of a search engine. Browse around and make connections. I’m not sure where to take it from there, but that’s a great start.

Read more…

Be the first to comment - What do you think?  Posted by admin - June 28, 2014 at 3:19 pm

Categories: Data Design   Tags:

Project Design 04

This project is intended to be of benefit to the SFAA, and it is intended to be feasible with available resources. It’s time, therefore, to see if I can articulate those benefits. I particularly need to be able to explain how we can make use of volunteer help.

So. How do we put this project in practical terms?

Read more…

Be the first to comment - What do you think?  Posted by admin - at 2:49 pm

Categories: Data Design   Tags:

Project Design 03

Seven-Volume Index

What is the best way to achieve a usable result which any member of the Association can see as clearly beneficial? I think the answer is the Seven-Volume Index. An ideal result would be to connect it up to the Langbehn database, but I’m not sure there’s time for that to happen.

One key challenge is digitizing the Index itself. I have much of the original index in the form of floppy disks with a custom file format, running MS-DOS, probably on the PC/AT. There’s no way of knowing if those disks are complete, or if they represent the final copy after proofing.

I therefore think the best approach is to scan and OCR the text. The index is on 8.5×11″ (standard letter size) paper. There are 861 pages. It is printed in four columns per page. This means there may be a major challenge in converting this to a usable text file.

It seems to me, though, that the best approach is to start with the digitized index. Getting a working text file is independent of the Data Design, and can therefore be started right away.

Read more…

Be the first to comment - What do you think?  Posted by admin - at 2:05 pm

Categories: Data Design   Tags:

Project Design 02

I view our project (the Web site) as having two distinct phases:

  • We have the User Experience. I believe this is what we want to design first.
  • We have the import process. The data creation/import process is undoubtedly different for each type of artifact or collection.

Project Design 01 talked about the first item. This post talks about the second item.

Read more…

Be the first to comment - What do you think?  Posted by admin - at 1:45 pm

Categories: Data Design   Tags:

Project Design 01

Project Constraints

The solution is dictated to be MySQL with the CakePHP Framework for PHP. My self study is aimed in that direction. I am assuming this won’t create a barrier to implementing the data design in Oracle at the same time.

CakePHP has some slightly weird table-naming expectations. I have found that if you follow the CakePHP expectations to the letter, you can get things up and running in extremely rapid fashion. If not, you don’t. I am therefore dictating that the actual MySQL implementations follow the naming conventions documented at http://book.cakephp.org/2.0/en/models/associations-linking-models-together.html.

CakePHP does have a good way of building a solid Web site based on highly normalized tables. CakePHP thrives on a correctly-normalized design. It’s very convenient, given the curriculum sequence, that I intend to create the data design first and build the Web site out from the schema.

I would ideally like to have some sort of useful demonstration up and running for the Strong Family Reunion.

This means that I need to find a way to articulate what I’m trying to accomplish, and figure out what this means for people willing to add their help. These would be genealogy people not hi-tech computer people.

Read more…

Be the first to comment - What do you think?  Posted by admin - at 1:25 pm

Categories: Data Design   Tags:

Mining the Historian Archives 08

The Project

Can we boil all of this background down to a reasonable project? I have hopes that we can.

I envision the following primary ways of looking at the Historian Archives:

  • Browse and search our various online genealogy databases. We receive a file in GEDCOM format and import it to TNG for Web display.
  • For a given Dwight number, browse a list of collections tied to that Dwight number. You can click any item in the list to get to that collection. The Dwight number is meaningless of itself; it is in effect a surrogate key. All Dwight-based links need to have suitable descriptive text.
  • For a given name, browse a list of resources tied to that name. This may be tricky to display in useful and coherent fashion. This may be a good place to separate presentation from content.
  • For a given descendant line, browse all resources tied to that descendant line.
  • Each display page should likely contain links to each of the above, tuned to that specific page.
  • We have a table of contents page providing links to each collection.
  • We provide a means (probably an XML site map) for search engines to correctly index the site(s).

Read more…

Be the first to comment - What do you think?  Posted by admin - June 27, 2014 at 8:21 pm

Categories: Data Design   Tags:

« Previous PageNext Page »