This page is the starting point for major discussions on a genealogical data model for BetterGEDCOM. Also, thoughts, comments and other input on the discussion tab of this page above will be added here. If you have significant input to the BetterGEDCOM data model, please add it here. If you have a comment, a minor point or something you don't feel is quite clear enough to add, please place it on the discussion tab.
Data Modeling Introduction
To begin with, you might want to study the
Data Modeling article at Wikipedia to understand how data modeling is supposed to work. This is meant to be useful and instructive rather than trying to get anyone to adhere to a rigid process or structure.
Bottom Up vs. Top Down
There are those who say this effort should take each data element, one at a time, and build a data model from the bottom up. Others say this approach is foolhardy, and that one must start with an overall philosophy first. There are ample opportunities for both approaches, and these discussions in each will obviously influence. Whether you favor a "big picture" or a "devil in the details" approach, you should be able to find a place to hash our your ideas.
Bottom Up Approach
This approach states that while the data model is of critical importance, it will be built incrementally, element by element, rather than with any preconceived ideas from other models. All other models are a source of inspiration but none is a blueprint. This approach starts with each core element, carefully defines it and slowly builds up the data model. In this way it seeks able to build the best practical model rather than one that matches any particular philosophy form the outset.
Top Down Approach
This approach begins with a particular philosophy and lets this philosophy develop the elements of the data model. Core to this approach is a study of previous data models and understanding what each is trying to achieve. Certainly no work here can be done without some knowledge of data models and their philosophies.
Add Your Voice To the Discussion
This is a community project, and your opinion is valued and indeed needed. If you don't see the part of the discussion you think is important, it is because you haven't added it. Please jump in an participate, or your valuable input will be missed! If you have general comments to make or don't know where to put your ideas, just use the
Discussion tab above to write your thoughts, and they will be added here by the moderators. If you see a section that pertains to the issue you feel you can elaborate on or help with, please go ahead and edit this page. Any ancillary elaboration that would clutter this main discussion can be added via the
Discussion tab above. A rule of thumb: If your comments clarify or help define better, add to this page. If your comments are part of a debate, use the
Discussion tab.
As a starter, I give some issues that I can see with the current GEDCOM and most software.
- language specific naming
- hierarchical locations suffer from historical changes (conquests, wars, treaties, etc)
- not everybody agrees to historical changes: some consider a period as a foreign occupation while others may see this as justified annexation. A typical recent example could be the Falkland Islands, under the British flag, but considered by some to be occupied territory from Argentina.
- one physical location can have multiple hierarchies, depending on the context: administrative, judicial, religious, ...
- matching between such hierarchies would be great (e.g. to match a baptism (religious) to a birth (administrative), but the covered areas are not always a match (e.g. parish vs. town)
- some locations need to be more an area than a point location, e.g. to state where a profession was performed, or a title was held (a priest for a parish, a town responsible, people from the nobility)
Comments are welcome!
There's a big database "GOV - the genealogical gazeteer" covering Germany and quite a bit of the rest of Europe here:
http://gov.genealogy.net/Locale.do?language=en&country=us
I think it's been designed and is maintained by the people behind
http://www.genealogy.net/
This is mostly a German project, a vast resource for everything concerning genealogy. There are a lot of intelligent and knowledgable people involved there -- at least that's what I'm assuming ;-), being just a user of their website(s) and services. But maybe you could get some of these people to contribute here? I'm sure they could add valuable perspectives.
Contact adresses I've found on the website are:
gov-support@genealogy.net
vorstand@compgen.de
Mailinglists of potential interest can be found at:
http://list.genealogy.net/mm/listinfo/gedcom-l
http://list.genealogy.net/mm/listinfo/genealogie-programme
The GEDCOM 5.5EL (meaning "Extended Locations") proposal was developed by the "Society For Computer Genealogy" and a number of authors of German geneaology software.
The problems you have been discussing of time-dependent spatial referencing (by geographical identifiers and/or coordinates) are not unique to genealogy, of course. Having been involved in ISO/TC 211, Geographic information/Geomatics, for over a decade I am somewhat biased, but I think that our suite of standards (including ISO 19136:2007, Geographic information - Geography Markup Language) will cater for a lot of your needs, and anything that is missing can be added to the standards.
While ISO standards have to be bought, unfortunately, there is a comprehensive standards guide that can be downloaded from the ISO/TC 211 web site at: http://www.isotc211.org/
Regards
Antony
I do NOT advocate that place entries within BetterGEDCOM be tied to ANY external or universal geographic mapping system or anything like that. People should absolutely be able to specify places as they want to, whether anyone else understands them or not.
I am merely suggesting we add support for passing information about where a particular geographic mapping system has reference to the place the original user is referring to. The entirety of the mapping of that place, conformance of recording that information, etc., should and would be something done within the genealogical application the person was using.
Remember BetterGEDCOM (for purposes of this initial project) is merely a way to store data uniformly between programs. We're not trying to add features to actual genealogy software but we do want to accommodate pass along all data that has been accumulated. Right now there are lots of ways users map the places in their genealogy databases, and all I'm advocating is a way for BetterGEDCOM to be able to receive that mapping information and pass it along.
Regarding URIs, I am not sure this is needed or appropriate, as that sort of thing is something the app developer would have to decide upon and implement. However, if URI reference format is appropriate, then great.
Dallan said, "The problem with using user-entered data as the basis for your place database is that users are not good at entering place data..."
Yes, I know this, and this was actually debated extensively during one of the GEDCOM 5.x revisions 15 or 20 years ago. I've read some of the notes. On the one hand, it's a nice concept that everyone would have nicely identified places that conform to some wonderful, universal, noncontroversial geographical mapping system somewhere. But in practice, things just don't work that way. Even if such a database existed, genealogists would still want to be able to cite inexact, vague or idiosyncratic places. I fully support giving folks the option to pass this sort of geographical mapping equivalency information but I completely oppose making such a system mandatory. This would be like insisting on everyone map their relatives to people who have been recorded in census records. It sounds vaguely ok but in practice it's a horrible idea.
WOW - This database is incredible. Germany is the most complicated country I can think of with respect to historical places, and these guys are doing a terrific job. THANK YOU for sharing it.
greglamberson,
I'm not arguing that every user-entered place must correspond to a standardized place. Just that "automatic gazetteer generation" (google it sometime) using user-entered places as your source has problems associated with it. If a records-manager can map a user-entered place to a standardized place, then the user-entered place can be shown on a map without the user having to enter lat/lon themselves; for user-entered places that can't be mapped you don't get this feature. That's all.
BetterGEDCOM, won't care how places are entered. BetterGEDCOM is a genealogical Switzerland: It is neutral on matters of preference for one system or another. Thus, however location information was entered, BetterGEDCOM doesn't care. If there were some location qualifier that the software program used, largely guiding users to enter places that were identifiable, then great. BetterGEDCOM doesn't care. However, the information that identified that place in that system should be able to be exported with the location value and subsequently imported into another software program. Period.
To be useful, would software programmers have to use this feature? Absolutely. Are they required to? Absolutely not. Is it within BetterGEDCOM's scope to require software developers adopt some location management system? Not within this initial project.
Here's an interesting idea: Could we develop a BetterGEDCOM Location Extension that did specify the sort of system you refer to? Absolutely. Such a project would be exactly the sort of thing that BetterGEDCOM would like to see develop as a secondary project. Within the scope of this initial project, however, such an effort would be counterproductive in that it would force developers to adopt a particular approach to location management, and that would be a deal-breaker for lots of vendors.
Does this resolve our differences?
xvdessel:
I have expressed it in the "Location Entity Over Time" thread: http://bettergedcom.wikispaces.com/message/view/Location+entity/30668879
but I'd like to re-emphasize that I'm totally against a location-time stamp combination.
I feel a specific location should be a single entity. If it changes names over time, then that should be within the entity. My example (using extended GEDCOM) is:
0 @P43@ PLAC
1 NAME Townsville
2 DATE From 1832 to 1912
1 NAME Citiesville
2 DATE From 1912
1 LATI N18.150944
1 LONG E168.150944
I believe this will quite easily handle DearMYRTLE's concern.
In other words, the entity should be "Location" and NOT "Time-Location".
Adding time onto any entity (known as time-stamping) gets into huge complexities. If identical places in different times are kept separate, then how will it be known they are identical? You'll have to add a connector - the first level of complexity. I was involved in an SAP data warehouse project, where the designers insisted on time-stamping the data. It was a disaster.
If a person changes name, or sex, or hair color, do you want to make them a "time-person" entity?
I also am one to believe that events can and should be assigned to locations, rather than just people, when it is appropraite in that they pertain to the location. (e.g. Fire in 1848), but that's another matter.
People are messy and uncontrolled, and without structure, they will create a mess of places. That will make it impossible for a program to put together a useful Place Index. Putting data together from multiple people and trying to organize places will be a disaster.
I'm for structure here.
I would not start a secondary project before the first one's even off the ground.
If a secondary project is required, then the model selected is much too complicated.
Even talking about a gazetteer of places is way beyond what BetterGEDCOM is about.
Adding a time element to a location entity is not at all the same thing as time-stamping. Time-stamping refers to adding an indicator to the data specifically related to when the data was last changed.
If the key for the Location entity was a combination of a time-place, yes, that could be a problem. I think the key for nearly every entity we use will be a UUID. Issues of duplication you raise will therefore be moot.
Regarding location structure, that's the job of the software app. We're just trying to provide a place for data to reside, however it looks when we get it.
Regarding other projects, there are several future projects in mind. That has no bearing on this project.
What about working with GoldBug? Art has passed away, but his coder still works the AniMap product.
I would therefore propose the naming convention of a "time-location" as follows:
A time-location is a point or an area that corresponds to a commonly known location during a specific period of time. One time-location could have multiple area definitions if its area changed over the period of its existence. A time-location could be independent of any organization (e.g. nature locations), but most time-locations should be related to an organization structure (administrative, judicial, religious, ...) which often implies a hierarchical relation to one or more time-locations of a higher level in the same organization structure.
For users, this could mean the following: If a software supports such reference db system, then you would first select a number of time-locations that you want to use in your data. If you selected one time-location (e.g. your currently existing town), the software could recommend you to also select historical versions of that location, or related areas that cover a similar location (e.g. a parish).
Whenever you then want to enter a location for an event (e.g. a birth), you would be presented with locations that are time-relevant: It does not make sense that a birth in 1970 took place in a town that only existed up to 1920. Moreover, the system could insist that a baptism should be located in a parish rather than a town. But still, as both a baptism and a birth event have time-locations with deductible coordinates, a good software could match one to the other.
What is important for the end user here is that such system implies a more strict data entry. Just have a look at this page:
http://en.wikipedia.org/wiki/Alexandria_%28disambiguation%29
and you will understand that a software that stores simply "Alexandria" as a location can never expect to have such data exported and reused with any degree of usability. Hence, the end user should first indicate which "Alexandria" he commonly wants to refer to (e.g. Town=Alexandria, County=Jefferson County, State=New York, Country=United States of America) and for which time periods (depending on the historical changes related to this town). When exporting, the exact data (in a format to be decided, but this is less relevant to the user) can then be provided, which ensures that the receiver (software and hence the end user) cannot misunderstand that time-location.
To all previous posters:
are there any open initiatives today that target such a time-location database?
Xavier
An example syntax could then be:
timeloc:<server>/<unique-time-location-id>
or
timeloc: