BetterGedcom - PLACE (BJD)

AdrianB38 2010-11-30T12:26:43-08:00

Place, Address or Location?

The GEDCOM standard talks about Places being "jurisdictional", which I've always thought of as being town, city, etc. The actual address (of an event or whatever) seems to be a finer definition of a location within the Place. Or at least, that's the way I use it and many others that I see). Thus, Place "Crewe, Cheshire, England" can appear on numerous events, each with a different address.

I've always thought this distinction between Address and Place is arbitrary and weird - with intelligent programming, there is no reason why the Place shouldn't contain the full address (e.g. "335 High St, Crewe, Cheshire, England") and that specific Place be picked up if the software wants to run a query to report on all Places in "Crewe, Cheshire, England" (say).

If you want to go down this route (which personally I prefer), then to avoid confusion between the original uses of Place and Address, I would suggest use of a more neutral term location Location.

AdrianB38 2010-12-02T14:17:30-08:00

"you'd have to have a way to format Crewe's location record so that any record pointing to Crewe knows which name to use"

Indeed - it's fine if the thing doing the pointing has the same dates as the names - but it almost certainly won't. E.g. "John Smith lived in location L001 from 1960 to 1990"
L001 is "Crewe, Cheshire, England" from 1830 to 1973
L001 is "Crewe, East Cheshire, England" from 1974 onwards (not the real date by the way...)
So in that sentence "John Smith lived in location L001 from 1960 to 1990", what do we substitute for L001????
Partly I want to say - not my problem, it's an application issue, not a data model issue. But that's just cowardice. The data model has to allow east of navigation to a solution. Trouble is - I'm not sure what solutions might look like...
How about....
"John Smith lived in Crewe, Cheshire, England (Crewe, East Cheshire, England from 1974) from 1960 to 1990"
???
or
"John Smith lived in Crewe (Cheshire, England to 1973; East Cheshire, England from 1974) from 1960 to 1990"
???
I'm not quite sure that either make very good English - and I have no real feel for how easy it would be to navigate the data to provide those words.

brianjd 2010-12-02T20:58:17-08:00

Well, good point. My preference here would be to use:

JOhn Smith lived in Crewe, Cheshire, England from 1960 to 1990 (1).

Footnotes:
(1) In 1974, Cheshire, England was renamed East Cheshire, England.

I would always go with the earlier name, but another person might go with the latest name. The data model shouldn't restrict that preference.

I does certainly add to the code any application would do, but we aren't really concerned with the applications. I would however most forcefully suggest, we plan with the application requirements in mind. No one will code for a standard that would demand enormous coding effort.

The data model should simply spit out the person, his dates for living in Crewe and every name it was known by during that period.

So a search would be like 'select * from fully_joined_locations_view where (city = "Crewe") and (date_from <= 1960) and (date_end >= 1990'.

I can write that code in one line. Trivial. Trivial is good. When it comes to applications. Making footnotes for it would also be trivial. Writing it to an XML export file would require absolutely no special code. Just my one little procedure for creating an XML data line. I already have such code that I could change for BG in a matter of minutes.

The code for exporting of my BG model and yours would take me all of half an hour to write, because it's already mostly done by me. In fact, I am confident, any model we come up with will be trivial. Importing/exporting one *compliant* model is simple. I could do this stuff in my sleep.

It's the user features that is where all the work is.

AdrianB38 2010-12-03T03:07:25-08:00

Brian - I reckon all that seems good. And your suggestion for how to display the change of name sounds good - it's a sensible illustration that we need, in order to test the model.

Agree very much with your comments on how much thought we need to have for the apps. And I'm encouraged by your thought that getting the data out would be OK.

greglamberson 2010-12-03T07:57:04-08:00

You guys are really veering off into the application world, I believe. This is almost entirely something that an application would have to resolve. Also in the example of Crewe, Cheshire, England, Crewe changes its name just as Cheshire does. You're going down the road of having places be defined by absolute coordinates on a map rather than existing in the context of their time and culture.
Adrian said, "Partly I want to say - not my problem, it's an application issue, not a data model issue. But that's just cowardice." Well, that's just not true. IT IS ENTIRELY an application issue. Change requires action. Action requires an application, and thus an application has to resolve any issues resulting from a change. Data does NOT change, not in a data model.

"I would always go with the earlier name, but another person might go with the latest name. The data model shouldn't restrict that preference." Again, data is data. The data model isn't going to determine that. How you enter the data is.

Let me just bring this back to another issue: This is a practical effort to house data as it exists in today's applications. How do you think this way of handling places matches up to today's software? How would you handle any discrepancies without distorting data as it exists now?

brianjd 2010-12-03T10:31:29-08:00

Greg,

It's easy to veer. Thought has a natural tendency to drift.

To answer your question, it really quite simple. The data that gets imported or exported will be obvious in my conception.

One possible simple sample:
<xml>
...
<person UUID='2331120006000599900' pid='I001' ...>
<name nid='N001' template='US Std'>
<namepart npid='NP01' value='John'/>
<namepart npid='NP02' value='James'/>
<namepart npid='NP03' value='Smith'/>
</name>
</person>
...
<location lid='L001' >
<Link class="person" ref="I001" />
<link class="event" ref="E002" />
<link class="object" ref="Obj001" type="jpeg" />
...
<LocationName lnid="001" template="UK Genl">
<Locarray loc1="Crewe" loc2="Cheshire" loc3="England" LocDate="1/1/1960" LocDate2="12/31/1973" LocSubOf="L004"/>
</LocationName>
<LocationName lnid="004" template="UK Genl">
<Locarray loc1="Crewe" loc2="East Cheshire" loc3="England" LocDate="1/1/1974" LocSubOf="L005"/> </LocationName>
</location>

<location lid='L002' >
<LocationName lnid=002" template="County">
<Locarray loc1="Cheshire" loc3="England" LocDate="1/1/1960" LocDate2="12/31/1973" LocSubOf="L003"/>
</location>

<location lid='L003' >
</LocationName>
<LocationName lnid=003" template="Country">
<Locarray loc1="="England" />
</LocationName>
</location>

<location lid='L005' >
<LocationName lnid=005" template="County">
<Locarray loc1="East Cheshire" loc2="England" LocDate="1/1/1974" locParent="002" LocSubOf="003"/>
</LocationName>
</location>
...
</xml>

That example is for a BG export/import. A BG file for older style GEDCOM would handle as an XML version of GEDCOM

<PLACE id="L001" value="Crewe, Cheshire, England" />
<PLACE id="L002" value="Cheshire, England" />
<PLACE id="L003" value="England" />
<PLACE id="L004" value="East Cheshire, England" />
<PLACE id="L005" value="Crewe, East Cheshire, England" />

brianjd 2010-12-03T10:52:42-08:00

Man, I wish there was an edit feature for comments!

<location lid='L001' >
...
<LocationName lnid="001" template="UK Genl">
<Locarray loc1="Crewe" loc2="Cheshire" loc3="England" LocDate="1/1/1960" LocDate2="12/31/1973" LocSubOf="L002"/>
</LocationName>
...
</location>

There are a few other errors in my example. Note that using the BG there is actually one less location/place. In the new model there is no need to duplicate Crewe,

I think that entities linking to locations in this model would need to have two pointers, one to the location and one to the location name. But any entity might have multiple locations, and hence multiple pairs of reference links. I included a few reference links in the above example for place. In this case, there is a link to John Smith, I didn't show the reverse link in John Smith's record. There is a link to an event discussing the reason the County got a new name in 1974, and an aerial photo of Crewe, recorded as an object.

brianjd 2010-12-03T10:57:33-08:00

I hate posting a third time. I wanted to address the discussion of the "application world". I think it needs discussing, and not as an after thought. Anything we design is going to have to be implemented by an application. We should endeavor to make it desirable to code for our proposed model. Hence for any data decision we make there should be consideration of the programming impact. The data drives the application. THe most elegant and beautiful data model in the world is worthless if it too hard to, or too much work, to implement it.

greglamberson 2010-12-03T15:09:46-08:00

Brian,

Yes there is a natural tendency to drift. Not being able to edit is extremely annoying as well. I have taken to copying my post, deleting it, then reediting unless someone beats me to posting again.

My concern with what you're doing is the assumption you're making about how current software handles parent and subordinate locations. The structure you suggest is pretty radically different from what GEDCOM does, and I don't think you can assume this is how apps handle places.

I didn't expect you to use different templates for County and UK General. Why did you do that? We should have at least a dozen defined subfields for place fragment names. Why wouldn't you just define a county using a UK General Template?

I totally agree about the consideration of the application world. No matter what we come up with, we will drastically affect genealogical applications. Any structural change we make that veers significantly from the GEDCOM model has to be carefully considered. We've got to know how to get from there to here in such cases.

I'm not sure this works yet.

mstransky 2010-12-03T15:37:10-08:00

"No matter what we come up with, we will drastically affect genealogical applications."
how can this happen if...

"Any structural change we make that veers significantly from the GEDCOM model has to be carefully considered."

Gedcom is limited, apps working with limitations becomes limited.

If we can not change to something which can look very different from gedcom BUT
1. still translate Gedcom back and forth
2. give users needs to better their progress.
3. also translate to post 2011 apps that jump on board.

If we deny any new structural concepts that covers those cocerns and let downs since 1998-2001.

How will any one offer a better source logging database, or person tracker and evidence crumb trails or even adopt a very easy local change over time tracker.

It kind of saying we wish to make a turbo charge vechile on the frame of a modle T?

I think we need to lossen the grip of mimicing the gedcom, yes no?

brianjd 2010-12-03T16:16:01-08:00

Well, I'm not surprised that I'm doing the unexpected.

Partly, I'm brain-storming in what I'm doing. Throwing things out to see how they might work or not. Largely I have never written a genealogy program. I've only ever been exposed to [ifelines, gramps and phpgedview. I think there was another I tried and hated, but can't remember.
I wouldn't know a professional genealogist unless they knocked me over with a cart of microfilms. Almost everything I know of genealogy I taught myself. Or learned from mail-lists. I don't read Evertons, and haven't finished my SAR membership papers yet. Volumes could be filled with what I don't know about genealogy. That's not to say I'm a newbie. I frequently help others with research and offer advice on how to do things and I document well, if not meticulously.
So, I'm throwing things out, and hoping others will add to or correct my failings. Hopefully along the way I'll stumble on something others find useful. I'd be happy just giving feedback on others models, and suggesting things that I think are missing based on my own data entry of thousands. Not to mention what I'd like to see for my transcription project of everything in Urloffen from 1650 to 1800. Something that requires a lot of evidence records and at some point hopefully conclusion people.

I'm not really assuming how anything is handled in current software. But, that's a good point about GEDCOM. I wasn't really trying to remain compatible with GEDCOM, since this is going to be an XML implementation. But perhaps too much change is too much. I was basically doing a ground up rebuild. I'll have to give this some more thought and study. Not that I've expected anyone to fall in love with my model. Hopefully I can get others thinking outside the ideas they are familiar with, and contribute something useful to the final model.

greglamberson 2010-12-03T19:43:43-08:00

mstransky,

Carefully considering what currently exists is not the same thing as adhering to it. I only say GEDCOM because that's what most apps are based upon.

Who said anything about "deny[ing]" any new structural concepts? I don't think there's been any lack of new concepts floating around.

We've never started with any particular model. Nothing's changed. However, in looking at new concepts we have to consider how they affect what DATA currently exists, not simply what we would design from scratch ourselves.

greglamberson 2010-12-03T20:25:16-08:00

Brian,

Great. I'm just trying to provide you with some other things to think about.

brianjd 2010-11-30T19:16:25-08:00

Adrian, excellent point. Tbis definitely my desire to go that route. My thought was to make this entity serve both purposes and be truly supporting of every international location syntax. The French do it different than NY and NY does it different than Texas, and Germany is different again! It seems many nations and states have their own mapping.

I've updated this page to reflect that. I've been frustrated with the way Gramps and probably pretty much every one is currently doing it, because no one handles all the formats.

Just like with name, the use of templates could help close many gaps.

AdrianB38 2010-12-01T13:42:58-08:00

Good. I don't think there's any point in trying to define more than a handful of basic formats - let's just employ templates. (Somehow!)

The bit that worries me about your Place entity is that while you have dates in there, there's no capability of having more than 1 name (or anything else). E.g. if "Crewe, Cheshire, England" changes the hierarchic name to "Crewe, East Cheshire, England" after the split of the county, wouldn't your model need to treat those as 2 entities?

It was for this reason that I was going to make a location entity's characteristic attributes (e.g. name and, err???) separate from the location entity - and presumably they'd have the same entity type as a person's Characteristics?

brianjd 2010-12-01T16:53:36-08:00

Yes, I think we'd want to create a new location as a child when a name changes. Placing the date of the name change as the end date for the parent and as the beginning date for the child(ren).

And therein lies the rub. There are a few things that need to be resolved.

It would be Cheshire, that gets the new child, not Crewe. Crewe as a place hasn't changed, but the higher jurisdiction did. Since we have the place entered with a template, we know the hierarchy and when it is entered, it could and should be split out into it's respective jurisdictional places. I neglected to add a jurisdiction pointer in my model. So Crewe would have a pointer to Cheshire as a jurisdiction.

So, Crewe would be a place, with a link to Cheshire as it's higher jurisdiction. The name changes, and county splits or merges. All of the subordinate places now need to be resolved. So they would all need to have pointers to the original jurisdiction and with the new jurisdiction. This could potentially be a lot of work. Especially if no one writes a nice wizard/tool for helping resolve it.

Yeah, you have a point there. You have a location, and then you have a name array which stores the date(s) covering that name. Since we can hope that the location isn't a floating island where the latitude and longitude are not changing faster than the continents are drifting. The name would also have to store the template type.

However, even with your convention, you'd still have the issue. Ok now you have one location, but you now have to fix the names for all the other locations that were in Cheshire and elsewhere and are now in East Cheshire or Lancaster, or ???.

It would be trivial for a program to pull up every former Cheshire location and have the user select those going to East Cheshire. It could then prompt for the new location that needs to be created for what's left, and continue until every place in the data is resolved. But it can do nothing to fix those Lancaster locations now a part of East Cheshire.

But in the end, when being exported for transmission, you'd have a location for Crewe and all it's names for the time period covered. Therefore you'd have to have a way to format Crewe's location record so that any record pointing to Crewe knows which name to use.

Now there's a nice riddle.

Comments