This page is the starting point for major discussions on a genealogical data model for BetterGEDCOM. Also, thoughts, comments and other input on the discussion tab of this page above will be added here. If you have significant input to the BetterGEDCOM data model, please add it here. If you have a comment, a minor point or something you don't feel is quite clear enough to add, please place it on the discussion tab.

Data Modeling Introduction

To begin with, you might want to study the Data Modeling article at Wikipedia to understand how data modeling is supposed to work. This is meant to be useful and instructive rather than trying to get anyone to adhere to a rigid process or structure.

Bottom Up vs. Top Down

There are those who say this effort should take each data element, one at a time, and build a data model from the bottom up. Others say this approach is foolhardy, and that one must start with an overall philosophy first. There are ample opportunities for both approaches, and these discussions in each will obviously influence. Whether you favor a "big picture" or a "devil in the details" approach, you should be able to find a place to hash our your ideas.

Bottom Up Approach

This approach states that while the data model is of critical importance, it will be built incrementally, element by element, rather than with any preconceived ideas from other models. All other models are a source of inspiration but none is a blueprint. This approach starts with each core element, carefully defines it and slowly builds up the data model. In this way it seeks able to build the best practical model rather than one that matches any particular philosophy form the outset.

Top Down Approach

This approach begins with a particular philosophy and lets this philosophy develop the elements of the data model. Core to this approach is a study of previous data models and understanding what each is trying to achieve. Certainly no work here can be done without some knowledge of data models and their philosophies.

Add Your Voice To the Discussion
This is a community project, and your opinion is valued and indeed needed. If you don't see the part of the discussion you think is important, it is because you haven't added it. Please jump in an participate, or your valuable input will be missed! If you have general comments to make or don't know where to put your ideas, just use the Discussion tab above to write your thoughts, and they will be added here by the moderators. If you see a section that pertains to the issue you feel you can elaborate on or help with, please go ahead and edit this page. Any ancillary elaboration that would clutter this main discussion can be added via the Discussion tab above. A rule of thumb: If your comments clarify or help define better, add to this page. If your comments are part of a debate, use the Discussion tab.



Individual Data Elements Discussions


Evaluation Of Existing Data Models


Formulation Of The BetterGEDCOM Data Model




Comments

xvdessel 2010-11-10T05:57:25-08:00
Location management
I would propose this thread to discuss details related to how locations can/should be stored.

As a starter, I give some issues that I can see with the current GEDCOM and most software.

- language specific naming

- hierarchical locations suffer from historical changes (conquests, wars, treaties, etc)

- not everybody agrees to historical changes: some consider a period as a foreign occupation while others may see this as justified annexation. A typical recent example could be the Falkland Islands, under the British flag, but considered by some to be occupied territory from Argentina.

- one physical location can have multiple hierarchies, depending on the context: administrative, judicial, religious, ...

- matching between such hierarchies would be great (e.g. to match a baptism (religious) to a birth (administrative), but the covered areas are not always a match (e.g. parish vs. town)

- some locations need to be more an area than a point location, e.g. to state where a profession was performed, or a title was held (a priest for a parish, a town responsible, people from the nobility)

Comments are welcome!
testuser42 2010-11-12T11:14:17-08:00
Maybe this is already common knowledge, but if not it might be helpful:
There's a big database "GOV - the genealogical gazeteer" covering Germany and quite a bit of the rest of Europe here:

http://gov.genealogy.net/Locale.do?language=en&country=us

I think it's been designed and is maintained by the people behind

http://www.genealogy.net/

This is mostly a German project, a vast resource for everything concerning genealogy. There are a lot of intelligent and knowledgable people involved there -- at least that's what I'm assuming ;-), being just a user of their website(s) and services. But maybe you could get some of these people to contribute here? I'm sure they could add valuable perspectives.


Contact adresses I've found on the website are:
gov-support@genealogy.net
vorstand@compgen.de

Mailinglists of potential interest can be found at:
http://list.genealogy.net/mm/listinfo/gedcom-l
http://list.genealogy.net/mm/listinfo/genealogie-programme
testuser42 2010-11-12T11:21:21-08:00
...to add to my post:
The GEDCOM 5.5EL (meaning "Extended Locations") proposal was developed by the "Society For Computer Genealogy" and a number of authors of German geneaology software.
AntonyKC 2010-11-12T11:46:30-08:00
I would like to suggest that this forum should separate the discussion of the data formats for locations from that of the universal gazetteer, the data base linking place names (spatial referencing by geographical identifiers) to coordinates. I am sceptical that the universal gazetteer is actually feasible, given how poorly mapped large parts of the world are, the political issues over authority, etc.

The problems you have been discussing of time-dependent spatial referencing (by geographical identifiers and/or coordinates) are not unique to genealogy, of course. Having been involved in ISO/TC 211, Geographic information/Geomatics, for over a decade I am somewhat biased, but I think that our suite of standards (including ISO 19136:2007, Geographic information - Geography Markup Language) will cater for a lot of your needs, and anything that is missing can be added to the standards.

While ISO standards have to be bought, unfortunately, there is a comprehensive standards guide that can be downloaded from the ISO/TC 211 web site at: http://www.isotc211.org/

Regards
Antony
greglamberson 2010-11-12T12:12:34-08:00
I think we're veering off here a little bit, at least from my perspective.

I do NOT advocate that place entries within BetterGEDCOM be tied to ANY external or universal geographic mapping system or anything like that. People should absolutely be able to specify places as they want to, whether anyone else understands them or not.
I am merely suggesting we add support for passing information about where a particular geographic mapping system has reference to the place the original user is referring to. The entirety of the mapping of that place, conformance of recording that information, etc., should and would be something done within the genealogical application the person was using.

Remember BetterGEDCOM (for purposes of this initial project) is merely a way to store data uniformly between programs. We're not trying to add features to actual genealogy software but we do want to accommodate pass along all data that has been accumulated. Right now there are lots of ways users map the places in their genealogy databases, and all I'm advocating is a way for BetterGEDCOM to be able to receive that mapping information and pass it along.

Regarding URIs, I am not sure this is needed or appropriate, as that sort of thing is something the app developer would have to decide upon and implement. However, if URI reference format is appropriate, then great.
greglamberson 2010-11-12T12:26:51-08:00
Dallan, I totally agree that inability to edit discussion comments or even format them is extremely annoying, particularly for a wiki...

Dallan said, "The problem with using user-entered data as the basis for your place database is that users are not good at entering place data..."

Yes, I know this, and this was actually debated extensively during one of the GEDCOM 5.x revisions 15 or 20 years ago. I've read some of the notes. On the one hand, it's a nice concept that everyone would have nicely identified places that conform to some wonderful, universal, noncontroversial geographical mapping system somewhere. But in practice, things just don't work that way. Even if such a database existed, genealogists would still want to be able to cite inexact, vague or idiosyncratic places. I fully support giving folks the option to pass this sort of geographical mapping equivalency information but I completely oppose making such a system mandatory. This would be like insisting on everyone map their relatives to people who have been recorded in census records. It sounds vaguely ok but in practice it's a horrible idea.
DallanQ 2010-11-13T07:33:47-08:00
testuser42,

WOW - This database is incredible. Germany is the most complicated country I can think of with respect to historical places, and these guys are doing a terrific job. THANK YOU for sharing it.

greglamberson,

I'm not arguing that every user-entered place must correspond to a standardized place. Just that "automatic gazetteer generation" (google it sometime) using user-entered places as your source has problems associated with it. If a records-manager can map a user-entered place to a standardized place, then the user-entered place can be shown on a map without the user having to enter lat/lon themselves; for user-entered places that can't be mapped you don't get this feature. That's all.
greglamberson 2010-11-13T09:31:25-08:00
Dallan, Perhaps this explanation will help resolve our different ideas on location identity and management.

BetterGEDCOM, won't care how places are entered. BetterGEDCOM is a genealogical Switzerland: It is neutral on matters of preference for one system or another. Thus, however location information was entered, BetterGEDCOM doesn't care. If there were some location qualifier that the software program used, largely guiding users to enter places that were identifiable, then great. BetterGEDCOM doesn't care. However, the information that identified that place in that system should be able to be exported with the location value and subsequently imported into another software program. Period.

To be useful, would software programmers have to use this feature? Absolutely. Are they required to? Absolutely not. Is it within BetterGEDCOM's scope to require software developers adopt some location management system? Not within this initial project.

Here's an interesting idea: Could we develop a BetterGEDCOM Location Extension that did specify the sort of system you refer to? Absolutely. Such a project would be exactly the sort of thing that BetterGEDCOM would like to see develop as a secondary project. Within the scope of this initial project, however, such an effort would be counterproductive in that it would force developers to adopt a particular approach to location management, and that would be a deal-breaker for lots of vendors.

Does this resolve our differences?
DallanQ 2010-11-13T15:45:54-08:00
Sorry, I didn't think we had any differences. I'm fine with storing places as free-form text strings.
louiskessler 2010-11-29T21:03:12-08:00

xvdessel:

I have expressed it in the "Location Entity Over Time" thread: http://bettergedcom.wikispaces.com/message/view/Location+entity/30668879

but I'd like to re-emphasize that I'm totally against a location-time stamp combination.

I feel a specific location should be a single entity. If it changes names over time, then that should be within the entity. My example (using extended GEDCOM) is:

0 @P43@ PLAC
1 NAME Townsville
2 DATE From 1832 to 1912
1 NAME Citiesville
2 DATE From 1912
1 LATI N18.150944
1 LONG E168.150944

I believe this will quite easily handle DearMYRTLE's concern.

In other words, the entity should be "Location" and NOT "Time-Location".

Adding time onto any entity (known as time-stamping) gets into huge complexities. If identical places in different times are kept separate, then how will it be known they are identical? You'll have to add a connector - the first level of complexity. I was involved in an SAP data warehouse project, where the designers insisted on time-stamping the data. It was a disaster.

If a person changes name, or sex, or hair color, do you want to make them a "time-person" entity?

I also am one to believe that events can and should be assigned to locations, rather than just people, when it is appropraite in that they pertain to the location. (e.g. Fire in 1848), but that's another matter.
louiskessler 2010-11-29T21:08:06-08:00
... and I don't like places as free-form strings.

People are messy and uncontrolled, and without structure, they will create a mess of places. That will make it impossible for a program to put together a useful Place Index. Putting data together from multiple people and trying to organize places will be a disaster.

I'm for structure here.
louiskessler 2010-11-29T21:12:43-08:00
Greg:

I would not start a secondary project before the first one's even off the ground.

If a secondary project is required, then the model selected is much too complicated.

Even talking about a gazetteer of places is way beyond what BetterGEDCOM is about.
greglamberson 2010-11-29T23:39:09-08:00
Louis,

Adding a time element to a location entity is not at all the same thing as time-stamping. Time-stamping refers to adding an indicator to the data specifically related to when the data was last changed.

If the key for the Location entity was a combination of a time-place, yes, that could be a problem. I think the key for nearly every entity we use will be a UUID. Issues of duplication you raise will therefore be moot.

Regarding location structure, that's the job of the software app. We're just trying to provide a place for data to reside, however it looks when we get it.

Regarding other projects, there are several future projects in mind. That has no bearing on this project.
DearMYRTLE 2010-11-11T21:41:46-08:00
With regards place structure -- what about a US town that existed historically in one of 3 counties in a state, but now exists in a 4th county?

What about working with GoldBug? Art has passed away, but his coder still works the AniMap product.
xvdessel 2010-11-12T02:25:56-08:00
When I proposed the idea of a reference database, I think (and this seems to be comfirmed by several comments above) that we should go for a database that includes the time as a dimension: no location should be defined without an associated time delimitation (which could be infinity on both start and/or end time). Obviously, each location, no matter at what time it existed, should also have world coordinates (or an area definition if possible). And ideally such a location should be hierarchically linked to related larger locations (town,county, ..., country). It could be more than on higher location, if the parent location changed in some way during the lifecycle of the location itself (e.g. a town can exist for 300 years, but have belonged to 3 different counties over that period).

I would therefore propose the naming convention of a "time-location" as follows:
A time-location is a point or an area that corresponds to a commonly known location during a specific period of time. One time-location could have multiple area definitions if its area changed over the period of its existence. A time-location could be independent of any organization (e.g. nature locations), but most time-locations should be related to an organization structure (administrative, judicial, religious, ...) which often implies a hierarchical relation to one or more time-locations of a higher level in the same organization structure.

For users, this could mean the following: If a software supports such reference db system, then you would first select a number of time-locations that you want to use in your data. If you selected one time-location (e.g. your currently existing town), the software could recommend you to also select historical versions of that location, or related areas that cover a similar location (e.g. a parish).
Whenever you then want to enter a location for an event (e.g. a birth), you would be presented with locations that are time-relevant: It does not make sense that a birth in 1970 took place in a town that only existed up to 1920. Moreover, the system could insist that a baptism should be located in a parish rather than a town. But still, as both a baptism and a birth event have time-locations with deductible coordinates, a good software could match one to the other.

What is important for the end user here is that such system implies a more strict data entry. Just have a look at this page:
http://en.wikipedia.org/wiki/Alexandria_%28disambiguation%29
and you will understand that a software that stores simply "Alexandria" as a location can never expect to have such data exported and reused with any degree of usability. Hence, the end user should first indicate which "Alexandria" he commonly wants to refer to (e.g. Town=Alexandria, County=Jefferson County, State=New York, Country=United States of America) and for which time periods (depending on the historical changes related to this town). When exporting, the exact data (in a format to be decided, but this is less relevant to the user) can then be provided, which ensures that the receiver (software and hence the end user) cannot misunderstand that time-location.

To all previous posters:
are there any open initiatives today that target such a time-location database?

Xavier
xvdessel 2010-11-12T02:58:09-08:00
I agree with Greg that we should not enforce the use of one location system or another (Google maps, ...), but the standard should reference in a unique way a time-location (see above) in some commonly available database. What I would therefore recommend to have a look at URI's. Most people only think of http as a URI scheme, but there are many others, both official (e.g. mailto for mail addresses, geo for physical locations) and unofficial (e.g. secondlife, for a location in the virtual world). We could go for such a structure, simply stipulating the structure of the URI. This does not bear any link to databases or software that stores/uses it (just like http: can be used by many web servers and browsers).

An example syntax could then be:

timeloc:<server>/<unique-time-location-id>
or
timeloc:
<server>/<organization-type>:<time-location-id>[/<lower-level-time-location-id>[/..]][?[time=[from-time]-[to-time]]][<other-optional-attributes>

The first version requires a lookup on the site of the data provider to obtain all the details. The second contains much more details but has a risk of becoming unusable when something changes in the database. A simple example:
Today I record this location:
timeloc:thetimelocserver.com/administrative:belgium/antwerp/aartselaar?time=1750-
because the mentioned town still exists. But tomorrow it could no longer exist, in which case the correct reference would become:
timeloc:thetimelocserver.com/administrative:belgium/antwerpen/aartselaar?time=1750-2010
This is similar to a web link that stops working over time. Using some good tools however, one should be able to rebuild a more correct version of the broken URI afterward (just like some pages get linked to their new hosts), but it may cause problems.

Xavier
xvdessel 2010-11-12T03:02:10-08:00
To avoid confusion, in my last example, I did not mean to change the antwerp into antwerpen, only the period change is relevant.
If an editor can change this, feel free to do so.

Sorry
DallanQ 2010-11-12T03:56:10-08:00
The only open initiative that I know of is at WeRelate.org, which is an open-content wiki. If there were another initiative, I would be happy to adopt it.
dsblank 2010-11-12T04:18:11-08:00
DallanQ, is there an API for, say, downloading WeRelate's places?

Also, I guess you could figure out when a place was called what it was by looking at the records that reference a place. For example, if "Ben Davis, Indiana" was referenced by an event which had a date from 1926, then that would give you some data about when that place-name was used. However, not sure if everyone references places by their time-sensitive names.
DallanQ 2010-11-12T04:24:00-08:00
xvdessel,

I wanted to clarify my last comment. The only open-content database that I know of that attempts to integrate time+place is the set of place pages at WeRelate.org. These place pages are available in XML format under an open-content license. If there were another time+place open initiative, I would happily drop the one at WeRelate. But I haven't been able to find anything. And I've looked fairly extensively. There are some databases, like TGN (which is not open-content), that include _some_ historical information, but it's very limited.

greglamberson,

Yes, what I'm suggesting is that if you don't want to force the user to create this table by entering the data for each place themselves, then the software needs to have a way to generate this information based upon the string the user has entered (e.g., "Shoreview, MN" becomes L2=Shoreview, L3=Ramsey, L4=Minnesota, L5=United States, Lat=..., Lon=...).

The question is, how is the software going to do that? Either you put the onus on each software developer to create their own database of historical+current places (which is what's currently the case), or you create a free database of historical+current places (where you can represent that a town was in 3 counties and is now in a 4th). If you want to go down the latter route, how will you do that? Either you create your own database from scratch, or you generate your database from data that someone else has. I couldn't find anyone with historical data, so I created the pages at WeRelate and make them available under an open-content license.

So you could take the pages at WeRelate as your database.

Or you could create your own database from scratch and think of WeRelate as one of the providers.

Or you could punt and say that every genealogy vendor is on their own to convert user-entered place strings to structured place data.

DearMYRTLE,

You can download GIS (shape) files for historical county boundaries here:

http://publications.newberry.org/ahcbp/

You could use this information to say which counties a town would have been in over time, so long as you knew when that town was founded. But this is US-only. I haven't been able to find similar resources outside the US. The best website I've found is statoids.com, but they just talk about high-level changes.
DallanQ 2010-11-12T04:55:22-08:00
greglamberson,

I wish I could edit my responses (kind of annoying that I can't do this in something that calls itself a wiki).

"So you could take the pages at WeRelate as your database" would be clearer as "So you could take the pages at WeRelate and use them to create your database".

dsblank,

There isn't an api. Like wikipedia, you download the data in a big XML file and you have to process it yourself. A file is available for download at:

http://backup.werelate.org/pages.xml.gz

Be aware that it's rather large - it decompresses to 6G. It contains *all* pages at WeRelate. You'll have to process it to extract the place data yourself.

Once you did this and extracted the fields from the place pages into a database, you could use that information to create a "user-entered place string to structured place data" service. That's what I do to standardize user-entered place strings when gedcom's are uploaded.

The problem with using user-entered data as the basis for your place database is that users are not good at entering place data. Suppose that you have 10,000 users, and 10 of them have a place called "Shoreview, Hennepin, Minnesota" in their database. Does that mean that Shoreview was once in Hennepin county? Or does it mean that those 10 users were confused? You don't really know.

You're better off if you can access the places found in the original records, but they have their own problems. Place-strings found in original records often use abbreviations and lack mid-level jurisdictions. FamilySearch for example uses their own internal place database to standardize place strings found on historical to standardized places because of these issues.
dsblank 2010-11-12T05:50:22-08:00
DallanQ, thanks for the detailed info! This might be useful for other reasons, too. I'm always looking to see how Gramps can connect to other resources.
xvdessel 2010-11-12T06:00:28-08:00
DallanQ, 2 reactions:

- maintenance of such a DB: I think you need to use typical Web 2.0 techniques here. End users could propose records for addition, but only editors could accept it. Editors can be responsible for a specific hierarchical section (I believe DMOZ (www.dmoz.org) works like that as well), e.g. one or two counties in the US, on which they can prove their knowledge by validating and accepting user proposals. After some time, they can then evolve to editors are higher levels.

- regarding the XML of an expected database. Ideally, such database should be accessible with other things that a full dump request. Via a specific request (see my URI discussion above) such database should be able to provide only the records that relate in a particular way to some given input data (e.g. an existing time-location, or a lat/long position, or even simply a town name and a date). Depending on what is asked, the database could then provide child locations (e.g. all locations within one county or state), all historical time-locations that map within distance X from the lat/long coordinates, etc. In such a way, any software developer can embed standard XML calls to such a db to fetch the right data. Another element could be to provide any known translation or name variant of a time-location as an answer (e.g. Aachen, Aken and Aix-la-Chapelle all refer to the same German town). Obviously, the software would then store that data locally (a kind of cache) for future reference, but in case the user enters an unknown location, the database can provide possible time-locations that match.

I think such databases should be free in consultation mode (to allow anyone that receives a file with such reference to look it up), but maybe not necessarily in search mode (i.e. when a user in a software enters some elements and wants to search the database for any matching time-location).

I'm not sure whether it should be BG to define such an URI and the required functionalities, or whether it should be a separate initiative. But I think it is definitely linked in some way to a new BG standard, as the current way of location storage in GEDCOM is really a mess (partially the standard, but more importantly how it is used by most software today).

And yes, I also hate it not to be able to re-edit my posts.

Xavier
greglamberson 2010-11-12T06:28:03-08:00
Before we get too far here, let me throw down a few concepts I also think are key.

First, I do agree that a location entity has a time and space element. I do not think that this combination defines the place absolutely (that is, I don't think the combination of these two items represents a key in a database talbe). A location entity should perhaps have a UUID-type key as well, because I think two places can have the same time and name components, even in a hierarchical naming system.

Any mapping of location entities to external mapping systems should be a sort of relational mapping, not some indication of absolute equivalent identity. In other words, there should be no attempt to combine the two into one element. This is probably more important for the future when this might be possible, but I still think it's an idea we should codify now.

The location database I favor is one that exists solely in a single database. This database's location entities _can_ have relationships with those in other, perhaps "authoritative" location databases, but they certainly don't have to.
xvdessel 2010-11-12T07:40:32-08:00
Greg,

Then I have some fundamental concerns about BG. If you don't want, at some point in time, to enforce people/software to use some means of time-location unification that can be matched to some other user's data using a similar unification, then you will continue to end up with unmatchable location data like we have today.

Simple example: a search for the family name Claes and the town of Rumst on Geneanet, leads to these forms:
Rumst Antwerpen, Belgium
Rumst Belgium
Rumst (B) Antwerpen, Belgium
Rumst,2840,vlaanderen,belgië Antwerpen, Belgium
Rumst,provincie Antwerpen Antwerpen, Belgium
Rumst(Reet) Belgium
Rumst,2840 Belgium
Rumst, Antwerp Antwerpen, Belgium
Rumpst Belgium

(Rumpst is the old written form. Reet is part of Rumst, but at the date mentioned by the researcher, Reet no longer existed as an official town. You are lucky no French researcher wrote things like Rumst, Anvers, Belgique)
I know the town very well, and I'm pretty sure all of them refer to one single town (which has changed in size over time, but even that is unrelated to the above diversity).

Hence, I hope that, probably in a level approach (see other discussion), at some point we should enforce the use of relations to centralized databases for locations. I think software vendors are much closer to such changes than to decision process storage aspects.

I agree that there will remain exceptions, e.g. when a location is only partially readable. But these are unmatchable by nature, and should only be used as an exception. The same holds for some date information that is de facto invalid (e.g. a date of February 30th) but should be recorded as such. Some software offer 2 date fields for such cases (a free one and a standardized one, used for sorting, calculations, ...). Again such exceptions should cover only a few percent of the data, and this is not where we complain about in GEDCOM.

Regarding your first point, I'm not sure I understand your point. A date and a location are indeed is not the (primary or unique) key, because (see my time-location proposal) the date should be a time period, not a date: locations have started to exist (e.g. they are founded) and can cease to exist (like to former East-Germany). Locations only are relevant within that period. Potentially, at some later date, that same name & hierarchy was reused for a similar time-location, but that is a different period and thus another time-location. And the time period can extend to infinity (e.g. for all places that currently exist).

The unique key is thus:
- a time period
- an organization structure (E.g. Government-administration)
- a hierarchy tree starting from the world top level down to the desired detail level (e.g. a town, a church, ...)

I live in a very weird country (Belgium) which has completely different structures depending on the area of competence. And this leads to some political troubles here. Typically the federal election areas do not map to the regions or communities that control things like towns.
But still, if you are precise enough in the organization you are covering (e.g. Government-administration is not the same as Government-Judicial or Government-Election areas or Government-Military or Government-Diplomatic maybe, to include embassies etc), then I'm pretty sure you can obtain a unicity.

Note that I did not mention a geographical location or area yet. Over the given time period, the time-location can have evolved in size or even location (moving church?? Why not?). Hence, the boundaries & related timespans when they were valid are an information element that belong to this location.

Maybe it would make things easier if I would use timespan-location as a name?

As far as I know, there has never been 2 distinct locations that existed at a common time point and with identical hierarchies within the same organization structure, except for very delicate cases like some that I mentioned in my very first post to this topic (war, occupation vs. rightful ownership, e.g. Jerusalem or the Falklands). Some locations can have unclear borders at some point in time, and some organizations have had deviant structures (alternate popes in France etc). There are language issues etc (see also my first post).

Xavier
hrworth 2010-11-11T04:14:59-08:00
A User Question:

Is this Location Database, External to the genealogy program on my PC?

Russ
greglamberson 2010-11-11T04:46:26-08:00
Russ,

Currently there are several different things that happen:
1. Existing GEDCOM uses place entries that are not part of a database structure. This means:
a. The same place is isn't represented as a database entry and merely referred to but instead each instance of a place is handled as a separate entry; and
b. The place names as contained in a GEDCOM file are simple text entries that don't have any geographical referential information other than the name. That is, there is no placement of the GEDCOM place information on any mapping system. Any determination of a place's actual location is done by genealogical software based upon its ability to interpret the name as typed;
2. Genealogical software programs have place databases. These place databases may or may not have the ability to use eternal mapping systems (e.g., GPS, Google Maps, GNIS, etc.);
3. External mapping systems: Systems like Google Maps provide the ability to equate geographic names to places on mapping systems either internal to their systems or via more universally recognized methods such as longitude/latitude, etc.

What all this means practically speaking is that within your genealogy software or service, you may have t resolve your places to various external mapping systems depending on the capabilities of the genealogy software or service. The moment you export your data via GEDCOM, any information mapping that was done within a particular genealogy application will definitely be lost.

I believe we should do 2 things:

1. Establish what would essentially be a table of geographic names within BG's structure rather than treating each place entry in each record as a separate entry; and
2. Include referential data fields for several external geographic mapping systems so that once a place has been paired with a place in one of these systems that the process would not have to be repeated after any subsequent exports and imports of data.
hrworth 2010-11-11T06:56:15-08:00
Greg,

Thank you. I still may be a little confused, but let me see if I understand this.

Right now, my software asks me to resolve a placed name. It is matched up to an external mapping system. I can accept the external "place name" or Ignore the suggested "place name".

The external mapping system is current information. My data may be hundreds of years old. I have to work around that within my program, because I may know that the jurisdiction lines changed over time. The "point on the map" didn't change, just it's name.

However, due to some features that are available to me in the program I use, I may double enter the name of that pin point. Depending on what I am doing with that information, I may change which pin point I want to use at for that feature.

When Sharing my research, I would want to share the selected data entry (Historical or Current) to the person I am sharing my information with. I should be able to mark which entry or both entries (could be more), then allow the person I am sharing the information deal with the information I am providing. Their External Mapping System may be different than mine.

The external mapping system would have to have a Time sensitive table, I think, if there is an external system in place.

This issue that I DO think needs to be addresses is the format of the data in that entry.

A Time Element should be looked at. If no entry, Current place would be default.

The various levels of Jurisdiction would have to be spelled out. Are the levels of Jurisdiction in the structure of the data exchanged? Then a Table would have to be created. I would have no clue what the international view of Jurisdictions impact would be.

I am sure I missed many things here, I am only reflecting some of the issues that I have seen among the User Community that I participate in.

Thank you,

Russ
DallanQ 2010-11-11T09:12:01-08:00
Here's an idea: what if we started an open-source project to generate a location database from a Wikipedia dump. Unique location ids are based upon wikipedia titles. Location attributes (such as lat/lon) are determined from the infobox templates that appear on most place pages. Jurisdictional relationships are determined from the navigational templates that appear at the bottom of most place pages. The program could be run periodically over a wikipedia dump to create an XML file containing a place database.

Anyone could download the XML file, or use it to create a historic-place name standardizer or geocoder web service.

There are similar efforts out there: dbpedia.org, but since their focus is not just on locations, the location data is pretty messy. The challenge would be to develop parsers for the various types of templates that appear on place pages. You could possibly use JWPL: http://code.google.com/p/jwpl/ to help.

I volunteer to work on this if others are interested.
greglamberson 2010-11-11T11:50:08-08:00
Russ,

The great thing about this is it's all completely customizable. The systems that are used to map locations aren't so much of concern as the basic concept. We don't have to adopt any one standard, but we could definitely support any number of geographical mapping systems.

Dallan,
That's certainly a wonderful idea. However, as far as an actual system, it seems like something developers of a particular app would be more likely to adopt. However, given this sort of extensibility, they could very easily adopt this sort of thing.

Here's an example of what a place entry could look like for what I man:

Fields:

location1(L1)= Detailed Location=mother's house
L2=City=Carbondale
L3=County=Williamson
L4=State=Illinois
L5=Country=United States
L6=LongitudeDecimal=37.7216741
L7=LatitudeDecimal=-89.2238760
(location provider 1)LP1=Location Provider 1=Microsoft Virtual Earth
(location provide reference 1)LPR1= Location Provider Reference 1=(whatever this service needs to find and locate the place)
LP2=Location Provider 2=Google Earth
LPR2=Location Provider Reference 2=(whatever this provider needs to find and locate the place)

Also, dates for locations are needed. My database provides this capability. Do the products you work with provide a capability to provide a date for which a location is applicable? This is one appropriate way to deal with some differences is place names. Obviously this won't work for Kashmir, and it's poorly implemented in the US geographic systems I am familiar with, but it's a start.
DallanQ 2010-11-11T14:20:54-08:00
I think the question for you is: do you want to make your users enter this information for each of the places in their gedcom, or are you willing to settle for something less that you can get from a standard repository (like wikipedia)?

I am not aware of a single comprehensive repository of place information that includes historical places, especially when you want dates as well. I've spent a long time looking. That's why I created the places wiki at WeRelate.org. The WeRelate place wiki pages include the ability for users to add date ranges to jurisdictional relationships (and alternate jurisdictional hierarchies), but the the date ranges and alternate jurisdictions must be added by hand and it's a monumental task. See for example: http://www.werelate.org/wiki/Place:Aberdeen,_Aberdeenshire,_Scotland

I think you end up with several options:

(a) each user enters this information for the specific places in their own gedcom.

(b) each records manager provider acquires a database of current geographic locations and adds what historic locations and relationships they can think of.

(c) someone develops an open-content current+historical geographic database by starting with an extract from wikipedia and either lets people add historic relationships to it over time, or encourages wikipedians to allow storing historic relationships in the templates on wikipedia pages.
hrworth 2010-11-11T16:32:58-08:00
Dallan,

I am trying to understand users entering information into a GEDCOM file. I hope that you mean enter information into my genealogy database and that database help deal with the Place Name / Location Name.

Right now, my software looks at a website for the current day location name.

If I want to add the historical accuracy, I have to find other resources so that I can apply the Time sensitive location name. I am learning where that historical information is, but sometimes it's difficult.

Your Item C would be great.

Russ
gthorud 2010-11-11T17:21:41-08:00
Since this seems to be the topic for adding requirements about places here are some more, rather simple ones:

1- A place name "record" should store a default prefix, typically a preposition - eg. in or at. It must be possible to override this in an event instance.

2- In many source records a number (or identifier, probably also containing letters) may be used to identify the place, the same number may occur in many sources. A place may have several such numbers. eg. because the numbering system has changed over time. The identifier may have several parts. It should be possible to store such multi part numbers and an identifier for the numbering system. This could also be used to access land record sources via the web - such services exists - and there exists map services where the same numbers are used. The same number may also be part of a nation wide solutions for unique land property identifiers that e.g. allows me to find out who owns the property today.

3- User defined flags associated wit a place

4- A value that will surpress higher levels when the place name is printed, because the name is not ambiguous and is assumed to be known by the reader.

5- The type of place (Country, City, Parish, Municipality, County etc) must be stored in a place record

6- A place type eg, farm, church, house, school etc.

And there are certainly more issues …..
hrworth 2010-11-11T17:47:53-08:00
gthorud,

Trying to understand #1 (default prefix).

Isn't that preposition dependent on how that Place Record is uses?

I just want to be able to enter the Place Name and let that place name be transported.

How that place name is used, in, lets say a report generator, is when that preposition would be added and/or controlled. Not in the transport of the information.

Please let me know if I am missing something here.

Thank you,

Russ
gthorud 2010-11-11T18:15:46-08:00
Re #1 - I GUESS you assume that you can choose the preposition based on the type of place? Unfortunately that is not the case in all countries/languages. Eg, here we have "at" cities and "in" cities, while English have only "in" cities - as far as I know. The only way to handle in/at is to store the appropriate prep with the place name. And that info must be transferred to other systems, so one does not have to re-enter that info in the receiving program.
greglamberson 2010-11-11T18:45:09-08:00
This thread is hopping.

Dallan,

I think the Location Provider and Location Provider Reference field I mention above should be completely optional rather than anything filled out by users. I envision that, within a software application or service (such as WeRelate.org), a place entered can be mapped to a location page (as is current practice). What I propose is that the place entry within the place database, which would be a table specific to that user's data (this I suppose would have to be generated at werelate.org based upon the places that actually occur within any given user's dataset), would have optional LP and LPR fields as I show above. For werelate.org these fields would be populated with LP1=werelate.org (or whatever you refer to your mapping system as) and LPR1=http://www.werelate.org/wiki/Place:Cincinnati,_Hamilton,_Ohio,_United_States (for Cincinnati, OH). Any provider that any application used could then be appended to this field, but of course this function would have to be adopted by software developers.

The function that I envision for this is largely something that would have to be accommodated by software vendors to be really functional, but it wold be totally optional and completely extensible. I wouldn't advocate saying "we recommend using Google Maps" and then providing fields for this service and no other.
gthorud 2010-11-11T20:11:26-08:00
On 11 nov 11:46 GMT+1 Greg wrote.

I believe we should do 2 things:

1. Establish what would essentially be a table of geographic names within BG's structure rather than treating each place entry in each record as a separate entry; and

2. Include referential data fields for several external geographic mapping systems so that once a place has been paired with a place in one of these systems that the process would not have to be repeated after any subsequent exports and imports of data.

Re.1. I agree that there should be a place structure that each occurrence (eg. in an event) of a place/place name/placeona (persona for places) can refer to. This will be one of the most important improvements that we can do – in my opinion. This structure needs further work. Each entry should only occupy one level in the place hierarchy, and may thus have records above it and below it, possibly belonging to several hierarchies.

Re.2. I think that each place in 1. could have several identifiers (possibly multi parts in sequence?) that will be accompanied by an ‘identifier scheme identifier’. There could even be an identifier type associated with it, in case the receiver does not know the scheme identifier.

One use of such an identifier would be to access a map system, another to access source records, etc. (See my entry with issues 1-6 above, especially #2) One ID may be used for several purposes or several IDs may be needed, the receivers application will know or can be configured to know, what can be done with an ID following a particular scheme. The identifier may implicitly determine if this is a historic place, or current, but I think a date or time period should be supplied in addition to the IDs. (We are likely to see a historic map and information service in my country next year.) Yet another application of an identifier can be to identify the data record for the place/place name etc, cf. discussions re. UUIDs elsewhere on this wiki.
greglamberson 2010-11-10T06:15:54-08:00
All excellent points. One thing that I personally come back to all the time is the inconsistency with which geographical name changes are accepted and incorporated by various mapping services. I am referring to changes that occur due to undisputed jurisdictional changes such as borders changing in U.S. counties. When you introduce the idea of disputed changes, the problem gets far more complicated.


What are your feelings about Geography Markup Language ( http://en.wikipedia.org/wiki/Geography_Markup_Language )? This tool is hardly a solution for the problems you mention but it is a related tool I feel should be used. (But how?)
xvdessel 2010-11-10T07:57:20-08:00
I agree that GML could certainly be an element of a solution, i.e. to pinpoint a physical location or area. Furthermore, it can be an aid in matching software as it could allow to calculate distances between 2 records that each have a GML-encoded location, and thus decide (based on tolerances) whether it could be the same location or not.

I'm not sure whether undisputed changes are an issue. If a reference database knows for given entities (e.g. a county) and given date ranges what the covered area is, then one could deduce one from the other (e.g. from date+GML to administrative location or from date+location to an approximate GML). But indeed, as you pointed out, mapping services (or any other reference db) is a prerequisite to be able to work with this.
The difference with disputed changes is that, when using such reference db to enrich the user's data, depending on the db, you could end up with one or the other of several disputed hierarchical locations. And that means also that matching by hierarchical analysis (which is how most genealogical web searches work) is very tricky. The same applies to undisputed changes if a search or matching is done without any date information (and thus you cannot be certain that both records use the same 'version' of a hierarchical location).
I agree that many of these problems are probably too complex to tackle straight away, but these are things to keep in mind for the longer term.

One key element could be that for locations a standard recommends not to specify locations itself, but rather a URI or URN that refers to a reference db (like ISBN numbers do). This can avoid language issues and improve search and matching tools greatly. But who will manage and maintain such reference db?
greglamberson 2010-11-10T13:07:59-08:00
First off, I certainly believe there should be a location database accommodated within the standard rather than some free-form entry of places. The software I use has a location database, and I would like to avoid reidentifying the locations of all these places each time I do an import/export. Thus URI/URN, long/lat, GPS, etc., location data should be accommodate. What place standards exist that should be accommodated? What does GNIS use? What other systems in various regions or even worldwide exist that we should accommodate?

I firmly believe that if someone has identified a place within their data that the corresponding information for that place should be passed so they don't have to resynchronize their places as is now the case.
greglamberson 2010-11-10T06:09:25-08:00
Discussion regarding how to develop a data model
Taken from the main page to form basis for a discussion on the matter:

From Tom Wetmore:
First step is an overall data model. This can't be done bottom up. I would suggest taking some proposals for models as a starting point. The stuff I see about Person entity below misses the point because it doesn't mention whether we are talking about the mention of a person in an event taken from evidence or if we are talking about an individual we have constructed out of all the research we have done.
hrworth 2010-11-10T11:52:40-08:00
2 - If you export your data for someone else, do you want then to provide that complete reasoning, or only the conclusion data?

I would like to control what I export period.

I may or may not want to exchange / export either of those pieces of information.

If I were a professional genealogist, which I am NOT, I might want to export Only the Conclusions I reached.

If I were sharing with a family member or a newly found cousin, I probably wouldn't share either of your items. The person I am sharing with, may need to draw their own conclusions based on the material they have.

Control / Options to include or exclude is what I would want.

Hope that helps,

Russ
hrworth 2010-11-10T12:00:32-08:00
3 - If you import data, do you only see the imported data as a source (input to your conclusions), or could you be willing to import the complete reasoning as such, and allow it to match with the reasoning you made, e.g. to determine why the same source data leads to different conclusions in the head of someone else?

First, I would want to view that information OUTSIDE of my own file. Initially, I don't want it part of my file.

I would hope that my program would let me completely or selectively import and merge information from that file, into my file.

When I do select to include information, I to include a complete 'record'. Record may not be the right term.

Taking the "Event" as described in Question 2, I would want the attributes of that Event, included all of the Source-Citation information what was recorded in the file.

I may or may not want the conclusions or information supporting that conclusion. If I were to include that, I would NOT want it to over write what I might already have.

I might want to compare the two conclusions and draw a new conclusion, delete a conclusion, combine the conclusions.

Options is the thread here.

Thank you,

Russ
hrworth 2010-11-11T17:31:19-08:00
4 - Do you see export & import only to exchange data with peer researchers, or also to allow people to change from one software to another with a minimal loss of data? It should be clear that each of these uses has clearly other requirements!

I think that answer is yes.

I may or may not know if the person I am sharing my research with is a Peer or not.

Bottom line, Yes I want to be able to share my research with a minimal loss of information.

What happens to the date after it is received, I can't control. I can guess it will be completely accepted, completely rejected or somewhere in between.

I haven't merged a GEDCOM file into my main file for years. I have my "standards" in data entry.

That does not mean that I don't review a GEDCOM file that I receive. Usually what is helpful is the Source of the data in that other file.

Since we are talking about Importing and Exporting, I would further request to my genealogy software developer that each entry that I do include in my file, from that GEDCOM use, in addition to any and all Source / Citation information, include the information as to where that information came from. That is, the information that is in the very beginning of that GEDCOM file.

So, lets take an Event. That Event would have a Source-Citation associated with it. I want that included. But, in addition the Filename, Date of Import, and Originator of that File in a format that would be consistent with Evidence Explained!.

Please don't read that the Source-Citation for the Event in that file needs to meet that standard, but the originator of the file should be.

Does that help?

Russ
hrworth 2010-11-11T17:35:58-08:00
5 - After you imported data X into your data Y, when you export that for someone else, do you want to flag the X-data as yours, or should it indicate the original source where you got it from? If X contains data Z, do you want to see the complete path via which you obtained it?

Yes, see my answers to Question 4. This is a very good question.

I hadn't really read this question when I answered Question 4, but I can tell you, that every once in a while I find an event that does not have a Source-Citation associated with it, that I know that I received in an early GEDCOM file, which is why I asked for that new Source-Ciation (GEDCOM file origins).

I certainly would want to pass that information along to any export that I would produce. In other word, Here is where I found this information. An online source, a physical document, a GEDCOM file.

Russ
hrworth 2010-11-11T17:42:12-08:00
6 - If you store a location, do you want the system to insist on a limited list of values (e.g. all known towns, etc) as that could increase the reuse of your data later on (i.e. to cope with translation and other issues)?

I think I answered this question on the Location discussion.

The offering of a suggested Place Name / Location is helpful, but I may or may not accept it.

I currently accept an Place name, based on a current place name, but if I knew that at the Time of the Event that Place had another name, I would add an additional, historically based, time sensitive Place name.

The best example is the where I lived in the late 1950's. It has a different name now. So, the place name when I lived there, in the 50's is what was entered. But, I would also use the current name to take advantage of some other features that the software provides.

I should be saying this on each post. This is just one User's opinion.

Russ
xvdessel 2010-11-12T05:34:37-08:00
Russ,

thanks for your input! I hope many others will follow.

Yes, I'm a genealogist (if I have time for it) and 15+ year professional experience in IT & software (banking sector).

I'm living and researching in Europe where very often (period <1750) the information in the archives is very limited (of a baptism only mentions father & mother, godfather & godmother. If you are lucky the priest mentioned in which church the parents were married, but almost never any dates are mentioned). Hence, the deduction mechanism is of key importance. However, I don't know any software that allows to record first 20 separate events, and then, one by one, allows to stitch them together into one single conclusion. Typically, the fact of NOT finding something is often also a piece of data (e.g. if a individual has only 1 birth in the region that matches its name & approx. birth date, the fact that the neighboring towns don't have that name/date combination is an element to decide which birth act to accept).

By my IT background, I learned the importance of a good model. But I also understand that end users view things from their perspective, and it is finally for them we do the work (including for ourselves when we use software).

And from your answers, it may be clear that your requirements seem to be quite high, which is logical. Whether all that is achievable is another story! But others have already commented on that (E.g. will software vendors use a new standard?).

Xavier
hrworth 2010-11-12T07:36:53-08:00
Xavier,

Thank you for your reply.

I spent 30 years in the communications industry with that last 10 or so, helping to write User Requirements for several software systems.

I have spent the last 10 years working / helping users of a specific Genealogy software program. Some of my requirements, I hope, reflect some of User Requirements that I have seen in the past.

I do hope that other Users jump in and help with these requirements.

If I don't start high, I won't get there.

I also hope that some more software vendors participate as well. After all, this project's hope, is to allow Users, like myself, Share my research with others.

But, I am happy to participate in the development of the requirements.

Thank you,

Russ
brianjd 2010-11-21T19:55:06-08:00
Personally, I think the question is asking the wrong question. But this is purely from my perspective as a software consultant who has designed systems for clients with varying levels of technical knowledge.

I think the goal should be to involve the participants in a discussion similar to Xavier's list of questions above.

In order to know how to design the system you have to know what your goal and expectations are. You have to know what features you want, what data you need to collect and any actions that need to be performed.

The technical people should help by guiding the questions in constructive ways so that answers to questions help to construct the model. If the questions are posed correctly, the data model builds itself.

For example, I think it is safe to say we know some base objects we need to collect information on. I think it's also safe to say we know some base requirements we have for any data model.

So a good starting point would be to define a base model from which we need to add.

For example, we know have people.
People have events, places, references, etc. associated with them.

We also know we have requirements about how to handle the data. Like the excellent questions and answers provided above.

I'd like to add, that I think a great deal of the work and a great deal of data structure could be gleaned from the Gentech data model. Not that I'm advocating their entire solution, but the data elements are a good source of information. I probably wouldn't implement them the way they went and I skimmed over a great deal of the cruft before the data definitions. Plus it would be overkill for some people, like me. But then, there's things there that I'd use and there's things in Gramps I don't. (A preview button on this page would be nice.)
hrworth 2010-11-22T03:10:23-08:00
Brian,

The requirements are "simple" from this Users point of view.

I want to Share my family history information, in my genealogy software, with another User, who may or may not use the same software. No data to be lost, and there may be Media within my file.

The project is to lay the ground work for the Transport of that data.

The request is not who to present that data, only get it from one place to another.

I understand that is not an easy task.

One End User's opinion.

Russ
testuser42 2010-11-22T12:44:46-08:00
Hello Xavier,

thanks for these very good questions. I agree with a lot or nearly everything that Russ wrote.
So maybe I'm just putting the same thing in different words, but here it goes:


- do you want your computer software to insist on storing the complete decision process you made to reach a conclusion in your genealogy?

I want to be able to store the decision process that made me reach the current conclusion. A simple note would suffice, I think.
If I come to a different conclusion later (because of new evidence or new understanding of the old one), I want to save this new conclusion and the process. But I want an option to keep the earlier conclusion stored away as a reminder (with a time-stamp?), so I don't make the same mistake again. If I decide to trash the old conclusions, that should be possible.


- If you export your data for someone else, do you want then to provide that complete reasoning, or only the conclusion data?

Everybody should be able to decide what he wants to export. Privacy might play a role etc.
But to be really helpful, I would export everything (sources/evidence, reasoning, conclusion) about the persons and events that I export. I think the software should default to this behavior, and offer options to leave out parts.


- If you import data, do you only see the imported data as a source (input to your conclusions), or could you be willing to import the complete reasoning as such, and allow it to match with the reasoning you made, e.g. to determine why the same source data leads to different conclusions in the head of someone else?

I would be willing to look at other people's reasoning and conclusions. If I agree, I'll take their conclusions (maybe add a "me, too"). If I don't, I'll take their sources only (if I think they're useful to me).
If I only had conclusion data without reasoning and evidence, that would only be an aide to further research, if at all.


- Do you see export & import only to exchange data with peer researchers, or also to allow people to change from one software to another with a minimal loss of data? It should be clear that each of these uses has clearly other requirements!

Both.


- After you imported data X into your data Y, when you export that for someone else, do you want to flag the X-data as yours, or should it indicate the original source where you got it from? If X contains data Z, do you want to see the complete path via which you obtained it?

The complete path would be ideal.
If I didn't change anything about a source/evidence record, then it might be OK to leave the previous "owner"-info.
If I agree to a conclusion and don't change it, adding my "me, too" might give this conclusion more credibility for future researchers.


- If you do an import from someone who already provided you some data before, do you expect your software to eliminate duplicates (i.e. data that did not change between the previous upload) and flag new data or updates for you to examine?

Yes, that would be very helpful.


- If you store a location, do you want the system to insist on a limited list of values (e.g. all known towns, etc) as that could increase the reuse of your data later on (i.e. to cope with translation and other issues)?

Ideally, the software would offer me a list of previously used places (pop-up or autocompletion as I type?).
Places would come in an evidence and a conclusion version, too. If a source just says "Newton", I can only put that down. If I believe that this Newton is the Newton in C County, I put that and my reasoning behind that into the conclusion place.
If I enter a new place without a source, it is just a conclusion place. I should be able to add a "certainty" to this, just like any other conclusion. Of course I should be able to add sources later.
The conclusion place should also collect data that isn't about any event of a person. Many things just concern and define the place, like a change of name, or government.
I would love a geneaology program that helps me gather more data about a place, like coordinates, historical and current information.
Places could be nested in other Places, like Country - State - County - City - Neighborhood etc. Since these relations are only valid at a certain time, it should be possible to save a time-span with these nestings, and to allow multiple "parent"-places.


Again, thanks for these clever questions. I hope my user's POV may be of a little help to you techie experts who do the hard work :)
brianjd 2010-11-24T11:01:12-08:00
Russ,

It's very apparent that your requirements are anything but simple. But then you might also think that walking on two legs is simple. In reality, very little is simple. First off you want to use "your" genealogy software. Which means any format or tool would have to be able to read the data from your genealogy software. This would possibly involve thousands of hours of reverse engineering of that program's storage mechanism. Or getting your genealogy software maker to write an export feature for the new format. But you have just given us a base specification from which to work.

Sadly, if we take that approach we're likely going to wind up with a model that is huge and cumbersome. As everyone will want to use whichever one they are already using. Which would encompass in the end every conceivable format.

But, I think you knew that when you wrote it.

Does your current software record everything the way you like it? Is there nothing missing? Is there nothing that you don't like about it?

Are we talking about building a data model and a program? It would seem to me, one result would be at least some useful code. I've built data models before and the code almost writes itself in the process, if done right.

My ideal is always to start simple.

We need a data model to have any idea where we want to go. This can really only be done element by element. However, it doesn't mean we can't use prefabricated structures and: cut out, copy and paste in the changes we want.

Take person for example, we seem to be of two minds here, one is an "evidence person" and another a conclusion person. Well let's simplify. You have instead a person. Jus a plain old person. If some evidence later points out that that person is two people, then all the model need handle is allowing a "split" transaction. You can take any piece of evidence in a person and split it out as another person with the same name. When splitting a person into two, you can choose which bits of data go with that person. Problem solved, and there is no need for a evidence person or conclusion person. You collect evidence on one person and later conclude it is more than one person. Keep it simple. Or at least obvious. After all if you're researching John Thomas Smith you're goal is one specific John Thomas Smith, not two or three or four. Why start with four and later have to merge them?

Alternatively, any decent genealogy program should have no problem merging two or more John Thomas Smith's. So users would be free to take either approach. Although, I'd think merging is more work than splitting. Many people have a tendency to take the hardest road, given free reign, myself included.

Of course, the problem is not in the data model, but in the implementation. Of course, I can see the other viewpoint, I have in one town. of about 1000 people, six Johann George Schmidt's all living at the same time, with births spanning a number of years, and several wives with the same first and ruf names and all with children overlapping. One couple is mine. Another his parents.

So, I say start simple, we'll complicate it fast enough. No need ot make any judgments at this stage.
We need the following type of data: person, place, source, citation, conclusion, event. We could also include: relation and characteristics.
hrworth 2010-11-24T11:16:15-08:00
Brian,

Yes, my request, as ONE User of a software program, is Simple. Get the Data from my Software Package and transport it to another End User, and 'don't mess with the data' in the transport.

To Get there, is not simple.

Does my program do everything I want? Absolutely NOT. BUT, if I start by asking for software limitations, or making the requirement simple, I'll end up with something simple.

The hope is, that this BetterGEDCOM group will lay out want the Users Really want and the technical types (not me) can help lay the ground work for the developers to come to the table to see how or what the developers can do.

Just so that you know, when I see "evidence person" I stop reading. I have a Person. I am gathering and documenting "who" that person is, through Facts or Events, along with that documentation. That "person" WILL change, based on new evidence. I WILL find conflicting information.

That said, I think you and I agree on this.

Russ
xvdessel 2010-11-10T09:22:56-08:00
I agree with Tom that we need to have at least an idea of the high level model we try to achieve. How can you talk about "data about events" or "data about a person" if it is unclear which direction you want in your model, what you mean by "an event" and "a person".

Therefore, the first key decisions will most likely be around the scope: does the model cover only conclusion data with reference to sources (like GEDCOM does) or do we want to extend it to source data elements, the decision process, and the resulting conclusions.
hrworth 2010-11-10T10:02:00-08:00
Tom,
xvdessel,

I sort-a understand what data models are, don't want to guess at what that means, but I am not sure we are there yet. Clearly, developers need to know or create the data models.

This wiki was created, because we, Users of various genealogy applications have an issue. WE Want to Share our Research with others without loosing information and what is shared is selectable by the user.

Now, if I had a Data Dictionary for the software that I might be using, that is Field Name, its properties, and what it's used for, I might be able to come up with a data model. But, I don't want to do that.

Please help us create that Data Model. Oh, I think, to do that, we need to have developers at the table. That is what we are trying to do.

I am not sure that I care if the end result is a GEDCOM or some other 'thing'. All I know is that the current GEDCOM feature doesn't work the way I want it to.

Please help us.

Thank you,

Russ
xvdessel 2010-11-10T10:54:19-08:00
Russ,

I can agree that an end user really does not care what and how the software does to exchange the data from one user to another. That is indeed an issue for the techies.

But that does not remove the burden from the end users, to come up with "business requirements". Let me put forward some questions to clarify what I mean by this:

- do you want your computer software to insist on storing the complete decision process you made to reach a conclusion in your genealogy?
- If you export your data for someone else, do you want then to provide that complete reasoning, or only the conclusion data?
- If you import data, do you only see the imported data as a source (input to your conclusions), or could you be willing to import the complete reasoning as such, and allow it to match with the reasoning you made, e.g. to determine why the same source data leads to different conclusions in the head of someone else?
- Do you see export & import only to exchange data with peer researchers, or also to allow people to change from one software to another with a minimal loss of data? It should be clear that each of these uses has clearly other requirements!
- After you imported data X into your data Y, when you export that for someone else, do you want to flag the X-data as yours, or should it indicate the original source where you got it from? If X contains data Z, do you want to see the complete path via which you obtained it?
- If you do an import from someone who already provided you some data before, do you expect your software to eliminate duplicates (i.e. data that did not change between the previous upload) and flag new data or updates for you to examine?
- If you store a location, do you want the system to insist on a limited list of values (e.g. all known towns, etc) as that could increase the reuse of your data later on (i.e. to cope with translation and other issues)?
Each of these questions relates to possible options or key choices in the data model. E.g. the last one could mean that a data model stores a globalized identifier of a location rather than a free text location, but more importantly that locations relate to a centrally managed reference database which could make it language-independent etc.

I hope this clarifies a bit why the data model is important.

Xavier
hrworth 2010-11-10T11:00:44-08:00
Xavier,

I sorry that you concluded that a Data Model was not important. I KNOW it is.

What I will try to do, is to answer your great questions, one at a time, over the next couple of hours. I want to think a couple of them through.

I do have a question for you though. Do you do family research and do you share your research? I ask, because I will then know how to answer the questions that you asked.

Thank you,

Russ

-- Oh, this is the dialog we need here. Thank you.
hrworth 2010-11-10T11:48:26-08:00
Xavier,

1 - do you want your computer software to insist on storing the complete decision process you made to reach a conclusion in your genealogy?

Wow, don't know how to easily answer that question.

The software show 'save' information as I leave a field.

Now sure what a 'complete decision' means to you.

I enter information into fields into the program. Most of the fields are Events. Today, an event might have four attributes or fields. And this is just an example:

Event / Fact Name:
Date of Event:
Place of Event:
Description of the Event:

I may or may not have the three fields completely filled in. I may only have the Event and the Date of the Event.

Doing research, I want to document that Event. Where did I get that information from?

I enter what I have, and cite the source of that information.

No conclusion yet, just recording the event.

I may have the same Event / Fact from various sources, each providing pieces of information.

Each would be recorded separately. If I find the same information, previously recorded, in another source, I'll add that Source-Citation to the existing event.

At some point in time, I need to Evaluate what I have collected for that Event. Following that Evaluation is when I might reach a conclusion.

I may conclude that I can not reach a final conclusion, as I might be missing some element or I might have to resolve some conflicting information.

That is why I am not able to completely answer your question. But, maybe I did.

Thank you,

Russ
DallanQ 2010-11-11T08:57:55-08:00
Evolutionary vs Revolutionary
One thing to keep in mind is that most genealogy record managers are built by small companies with very limited development resources. I think one of the problems with prior efforts in this area is that they were too different from the GEDCOM data model, and it wasn't clear how much the differences were going to help the average genealogist.

If you want the record managers to adopt a new data model, I think you'll want to make it easy for them to adopt (not involve a lot of programming) and you'll want to articulate clear advantages to their customers.

Here's an example of what I mean. I'm not advocating that we shouldn't promote XML; XML is a huge improvement over GEDCOM. This is just an example of a simple change that would give clear benefits.

Currently it's next to impossible to share pictures and other media associated with a GEDOM, because most GEDCOM files just store media as file paths. So come up with a jar/zip file format for a GEDCOM file and associated media (call it .gar maybe). Then users can create files that contain all of the pictures referenced from a GEDCOM, send those files to their relatives, and their relatives could load the files and have the pictures linked.
DallanQ 2010-11-12T07:10:39-08:00
Here are a few more things that would help improve existing gedcom support:

(1) A standard way (like a _living tag) to mark someone as living. Currently when someone has no dates, in order to determine whether they are living (so we don't publish data about them publicly) we have to start looking at dates on their relatives. It's a pain.

(2) Standardize on name pieces, or at least a requiring full-names to be in "name-prefix given-name(s) /surname/ name-postfix" format.

(3) For source citations, standardize on either a text sub-field or a note. Currently both are allowed by the standard. Some programs use both, some use just the text and throw away the note, others use just the note and throw away the text. It makes it nearly impossible to export a gedcom that will work for all programs.

(4) Standardize on a way to represent the type of parent-child relationship: adoption, biological, etc.

(5) You may want to allow people to attach sources and notes to relationships (e.g., child-parent, or person-spouse), in addition to sources on individuals and family objects.
Andy_Hatchett 2010-11-12T07:24:59-08:00
"
(4) Standardize on a way to represent the type of parent-child relationship: adoption, biological, etc."

At the risk of setting off a firestorm...

I've said this before and I'll say it again... there is only one type of genealogical parent-child relationship-biological. any other type such as adoptions, fosterings, et al are mere social relationships and *not* genealogy and as such have no place in genealogy reports such as genealogy charts, reports, etc.- they are family history; genealogy and family history are *not* the same thing.
xvdessel 2010-11-12T07:53:16-08:00
Andy,

To avoid your expected firestorm, I think that I can say that many researchers don't agree with you, for at least following reasons:
- many children are born from another biological father that the one who pretends (or even thinks) he is the biological one. This was true in the past, and still is (proven by research). Hence, unless you have very extensive DNA test budgets, forget to reach back 300 years in a purely biological way.
- What people become is a combination of nature (biological relations) and nurture. Many researchers value the latter, even if the former may be more important.
- Even biologically speaking, there is an issue, if you take into account surrogacy. A child can be born out of carrier mother X under a surrogacy agreement, grow up within family of parents Y and Z, but biologically inherit the genes from U and V. And wait till science can combine exotic pieces of DNA from different people to build tailor-made babies.

Hence, I think point 4 still stands as a point for most of us.

Xavier
hrworth 2010-11-12T08:05:23-08:00
Andy,

Thank you for the firestorm topic. I do understand what you are saying.

The thought of a BetterGEDCOM was for Sharing of Information. So, it's a way to transport information in my Genealogy software with you with your software.

The issue of Family History vs Genealogy is interesting. I would throw in another Group to the mix, 'Name collectors'. Please, we don't need to take a tangent on that firestorm.

The difference, I think is not the Transport of information, nor the software that is used. Be that online on a computer you and I might us. It would be how You and I use our Software.

So, for the purpose of the BetterGEDCOM project, you and I need control over what we see or bring into our software. In other words, WE need options on what we USE. The transport needs to be transparent to the use of the data that is included in the transport.

If I am close in the three categories above, we need to be able to share what we have.

Let me throw "One Name Projects" into the Name collector category, just for discussion. I know of a One Name Project that I would like to see information from. The BetterGEDCOM project is implemented (at some point in time), I would want that One Name Project file sent to me. I would open that file in my software. During Import, I would want to control what is brought into my new file on my computer.

Lets say, that in the file, during import, I find the ancestor I have been looking for. I'd like to be able to Identify that specific person, Bring in ALL of his ancestors, siblings, spouse(s), and descendants and toss the rest. That is what I want to look at. I may or may not bring that information into my own file. But that would mean that I have ALL of the information, most specifically research material (Source-Citations), and not loose anything in the file that was created by the One Name project.

One step further, in reviewing that One Name project file, I see another Ancestor, who I know about, but is in a different family line. So, I would want to go back into that One Name project file, and export another set of Individuals, into a new / different file.

I am not a Name collect, but not the Genealogist that you are. I have become more of the Family Historian that you mentions.

Won't address the Name collector group, but for you, and I do hope you reply to this, that You be able to control ONLY the Genealogical Parent-Child relationships that you want to see. If we, the folks helping define our requirements, we should be able to "toss" those social relationships that you mention and not have them included when imported. So, the marking of relationships are very important.

I would just add, that I would want to see a list of those 'tossed' relationships at the completion of the opening of that file.

All that to say is that we need to have the flexibility to transport data collected by the End User.

One User's opinion.

Russ
Andy_Hatchett 2010-11-12T09:31:18-08:00
Russ,

What I am saying is that yes, we do need a way to show social relationships; but that way should *not* be by tricking the software into believing that that relationship is in any way genealogical in nature. In other words, a child adopted by someone should *never* be shown as a child of the person who adopted them.

Presently there is no way to do this except with notes unless one links the child as an actual child of the adopted parent.

We need other linkages that clearly define these social relationships that will *always* show these distinctions, be it in charts, family group sheets, narratives, or whatever so that a step, foster, or adoptive relationship can never- under *any* ciscumstances, be mistaken for an actual biological relationship.

That is what I'm asking for.
hrworth 2010-11-12T09:51:25-08:00
Andy,

Do me, that is a software requirement, not the vehicle to transport the data. The transport has to be transparent, not to make any changes to what is presented.

It's in the export or import of the data, in my humble opinion, is where your requirement goes. By the way, I am supporting what you are asking, just don't want the BetterGEDCOM to do that.

Russ
xvdessel 2010-11-12T10:01:08-08:00
Andy,

then I can clearly follow your point. And indeed, we may need a wider area of relation types.

However, I don't think it is up to the standards for data exchange to define how users want to view and/or print the relationships that are maintained by the system. A godfather has also a relation, be it a different one from a parent. It should be left to the software to decide which options it gives to the end user about this. Some software will keep it simple, others may indeed use a specific attribute (line style, ...) to indicate the particular relationship or could even allow the user to filter non-biological relationships.

But we agree on the bottom line that the exchange standards need ways to define all types of relationships.

Just out of curiosity: In the case of a carrier mother who puts a baby on the world from an implanted embryo, who do you flag as biological mother? The carrier mother or the woman who gave her egg cells to build the embryo?
And let's take it a bit further: Like Dolly, one could have a fertilized egg, remove all chromosomes and replace them by the chromosomes of an adult person. Note that the mitochondrial DNA is probably still the one of the fertilized egg, not of the adult donor. Who would be the mother here? And the father? I know it is future, but when scientists did frog cloning, nobody believed it was possible for mammals.

Xavier
Andy_Hatchett 2010-11-12T10:35:52-08:00
Xavier,

In my mind, it is always the "sperm Father" and the "egg Mother" of the fertilized egg who are the biological parents- no matter what my be done to or with that fertilized egg at a later date.
AdrianB38 2010-11-14T15:03:39-08:00
Andy - I wholly support the idea that the exact relationship of a child to a so-called parent should be capable of exact definition.

My only point would be that BG _must_ accommodate family historians not just genealogists. Over in the UK many (most?) of us consider ourselves family historians, looking at the wider aspects of the family in all its forms.
greglamberson 2010-11-14T15:27:10-08:00
Adrian,

Limiting myself to your comment, "...BG _must_ accommodate family historians not just genealogists."

Personally, I use the term "genealogist" to encompass both concepts. The thought of one without the other is repulsive to me.

I have lately begin to think of genealogy as the collection of the data and family history as the analysis of that data, but other than that, I am unable to draw a distinction.

Rest assured, the needs of anyone considering themselves either a genealogist or a family historian are exactly what we should focus on. You'll get no disagreement there.
dsblank 2010-11-16T05:36:29-08:00
Re: Dallan's evolutionary GEDCOM extension suggestions:

I've put the 6 mentioned here, with comments. This should be an entire project. This may be more important and have a larger impact that the BG project itself.

(0) A standard way to represent their UUID. I think that this has evolved to be _UID.

We also need a way to represent a set of these. If two person's are later found to be the same person, then we need a way to represent that, and to show which is the primary UUID. We have thought some about this and how it can be used to later merge information back (say after two researchers have made edits).

(1) A standard way (like a _living tag) to mark someone as living. Currently when someone has no dates, in order to determine whether they are living (so we don't publish data about them publicly) we have to start looking at dates on their relatives. It's a pain.

This is an interesting bit of information somewhere between evidence and conclusion. Gramps has a very sophisticated program to compute living (also used for computing estimated births and deaths, which can be added and marked as "calculated" dates). But a living tag is interesting (we use a living mark in our web-based app, because computing living status can be expensive). The living designation might need to have justification (linked to the person's date that provided the evidence, in case that changes). Also, it should have a current date attached to the living status guess, so that the date can be taken into account later. Perhaps add estimated DOB, and DOD, if missing.

(2) Standardize on name pieces, or at least a requiring full-names to be in "name-prefix given-name(s) /surname/ name-postfix" format.

Yes, these distinctions are lost when we go to GEDCOM. Gramps has one of the most sophisticated name representations of any application. Just recently we added multiple surname support. But most is lost in GEDCOM export.

(3) For source citations, standardize on either a text sub-field or a note. Currently both are allowed by the standard. Some programs use both, some use just the text and throw away the note, others use just the note and throw away the text. It makes it nearly impossible to export a gedcom that will work for all programs.

How could this be done?

(4) Standardize on a way to represent the type of parent-child relationship: adoption, biological, etc.

Ideas?

(5) You may want to allow people to attach sources and notes to relationships (e.g., child-parent, or person-spouse), in addition to sources on individuals and family objects.

Yes, another thing that Gramps XML handles, but is lost in GEDCOM. Ideas?

-Doug
GeneJ 2010-11-19T07:22:27-08:00
@AdrianB38,

Modern genealogy recognizes extended families. See _Numbering your Genealogy_ for established standards (US) about recognizing adoptions, step-children, etc.

Now, how well modern genealogical software is able to support those standards, well, that is probably a whole other party.

Hope this helps. --GJ
greglamberson 2010-11-11T18:22:46-08:00
Dallan,

We certainly want to plan for the future and encourage adoption of best practices, but we are most immediately concerned with the practical ability to import and export data. Here's the core of the immediate project's problems to solve as I imagine them right now:
1. Can we accommodate a conclusion-based data model and an assertion/theory model at the same time?
2. If these two cannot be accommodated elegantly within the same data model, where should the line be drawn between the need to capture data from existing products and elements which are desirable to adhere to good genealogical methodology regarding theories and building proofs for assertions?
3. What are the implications of the above on software developers who are tasked with making today's genealogical software programs map to this data model? Do we need to tackle less ambitious goals to accommodate practical development problems? Should we provide mere stubs for certain data classes (e.g., theory, assertion stubs for evidence/source data class) to signal to developers a path for future development without burdening them with wholly new data constructs to accommodate?

Regarding multimedia formats that could be compressed, this concept which GRAMPS uses has already been suggested and adopted in discussions elsewhere today.
gthorud 2010-11-11T18:40:01-08:00
The multimedia transfer problem has been discussed in at least three other topics, see the discussions Goal2 under Goals, and 'XML for Multimedia ...' under Home - and there is a third one I can't find.
DallanQ 2010-11-12T05:04:54-08:00
I was just using that as an example of something that you could propose that would have clear benefits to users without creating a lot of extra work for the records-manager vendors, and therefore be more compelling for the records-manager companies to adopt.

The challenge that you face here is that I don't see any of the major records managers are participating. Without their buy-in, this effort will suffer the same fate as previous efforts have in this area over the past 10 years.

You need to convince at least some of the major records-manager vendors that your changes will be worth spending their development resources to implement. I think your most important goal is:

What changes can we make to the existing gedcom model that will provide clear benefits to users with a minimal amount of effort implementation effort required, so that our changes have a chance of being adopted by the records-manager vendors someday?

Until you address at that goal, this is just an academic exercise.
DallanQ 2010-11-12T05:43:35-08:00
Let me give you an example of what you're up against with this effort to create a better gedcom.

Attaching a globally-unique ID to every person in a GEDCOM is a good idea. It means that when I share my GEDCOM with a relative or with an online service like new FamilySearch or WeRelate, and then I want to send an updated version of that GEDCOM later, it's easy for the online service or the records-manager used by the relative to determine which people have been updated, deleted, or added, because everyone in my GEDCOM is identified by a globally-unique identifier. This makes merging the changes much easier and more accurate. So we have clear user benefit.

The code to create a globally-unique ID is not difficult to write. It's less than a page. FamilySearch has even made example code available. So we have "easy to implement".

This idea has been around for a long time and several records-manager vendors have adopted it. By convention, they store the ID in a (non-standard) gedcom tag called _UID. So we don't face the problem of: "If I do it and nobody else does, there's no benefit". Others are already doing it, so there's clear motivation to the remaining records managers to add it.

Given the above, what percent of GEDCOM's do I see with _UID tags? You'd think it would be pretty high. It's less than 40%.

Furthermore, some of the GEDCOM's with _UID tags don't actually store globally-unique IDs in them. There are some records managers that allow the GEDCOM user to store whatever text they want to in the tag, and it's not required to be unique even within the GEDCOM, much less globally. So in order to take advantage of the _UID tag, I have to first determine if this is from a vendor that stores true globall-unique IDs in that tag, or one that allows users to store whatever they want in that tag.

Just getting vendors to store globally-unique IDs in a _UID tag in the existing GEDCOM file format would be a big win.

It's not as much fun to identify the current user-facing problems with GEDCOM, figure out how to fix them and work with the vendors to adopt your changes, as it is to come up with an entirely new file format. But if you want to have an impact, that's where you need to begin.
dsblank 2010-11-12T05:59:02-08:00
I couldn't agree with DallanQ more.

Perhaps a useful role that people here could play is creating a smooth path from current GEDCOM to BG, whatever it may be. For example, if we could all make sure that our current GEDCOMs supported an agreed-upon set of extensions (like _UID) then that would be a big step in moving forward.

Gramps is set to add a _UID (and related infrastructure) to its system. Any advice is welcomed. Anything else that we should all advocate to make common extensions?
xvdessel 2010-11-12T06:16:57-08:00
Maybe (but if I remember well, this idea was already mentioned years ago in some of the historical GEDCOM-related efforts, and I even think I commented on that back then) we should build a leveled standard whereby a software can claim to respect the standard up to a given level.
We could say that level 1 is the current GEDCOM level, with only basic data covered. Level 2 could then be a more strict GEDCOM-like adherence, e.g. including things like Unique IDs, more exotic types of events, etc.
Level 3 would then be a new BG-level, but still at the level of conclusion data. It could include new concepts related to locations, date formats, event types and roles, character sets, gay marriages, Surrogate mothers, ...
Level 4 would then enter the raw event data and decision process.

To validate software, a set of files could be published for each level, whereby the test involves importing the test data into the software, perform a predefined data search (when and where was person X born?) and data entry (enter a new child for parents X and Y) and export all (or part) of the data again into the required format. After that, a publicly available program can compare the result file to a sample output file and give a percentage score.

I believe similar mechanisms are used to benchmark browsers and many other software.

Xavier
greglamberson 2010-11-12T06:49:55-08:00
Dallan,

We have significant support from several major vendors. I wouldn't expect to see their participation actively, because they're commercial entities, and frankly, they won't. Who pays software developers to take part in completely open community efforts? Who allows their developers to announce their participation in an effort such as this? But they're certainly watching, and we certainly are considering that aspect of things very carefully. Don't worry about that. We're on it. However, you're not going to see some big logo that says, "SuperGenealogySoftware Pro 2010 Approved" here.

Regarding your UUID example, this I think reflects the fact that a great deal of users use old software that is simple and works but may not be under current development. This phenomenon is significant but it is also not of great concern. GEDCOM is very useful to many people, and while many of us think it's horrible, it actually works just fine for a great number of people. Every program can import GEDCOM, and why would existing applications simply abandon an import/export enigine that works and is already developed. It doesn't concern me that GEDCOM may still be in use in 5 or 10 years by the troglodytes. Heck, they're using software that's essentially 15 years old now! (And we all know who they are.) However, this phenomenon also won't hamper adoption by vendors with current products that are interested in remaining competitive in the commercial space where value-add is imperative.

xvdessel,

We certainly intend to have different standards that deal with different aspects of genealogical technology standards. We will thus definitely have different levels of comliance that adhere to different standards modules, as it were.
DallanQ 2010-11-12T06:50:24-08:00
I found the reference to the FamilySearch provided code:

http://www.mail-archive.com/ldsoss@lists.ldsoss.org/msg00695.html

I can't believe it's been 4 years already. How time flies :-)

Java and other programming languages also have built-in functions for generating UUIDs.
greglamberson 2010-11-16T09:11:41-08:00
Person names, identity and analysis
I have just blogged about the lack of an analysis layer in genealogy software databases of today:

http://genmadscientist.wordpress.com/2010/11/16/who-is-john-smith-adventures-in-genealogical-data-modeling/

I'll attach this as a document to the main page.
brichcja 2010-12-02T01:52:36-08:00
XML format for census data?
Hi all. This is slightly off topic, but not irrelevant, and I suspect that people here will know the answer....

Is there a standard XML format for representing historic census data? It strikes me that the rigidly hierarchical nature of the old censuses maps very well onto the tree structure of the XML-DOM model. I appreciate that all the people out there who provide census data to the web, be it amateurs just extracting one name for their own hobby, or the LDS, must have some kind of database structure to store it, but is there a standard public-domain version out there? If everybody started sharing their census data in the same way, it would have all the same benefits as the betterGedCom; in fact, it would even feed into it.

Cheers,

Chris
dsblank 2010-12-02T04:16:35-08:00
Chris,

Gramps has census XML format that it has been developing, but only for the actual census columns. The data is stored in the Gramps database on events connected to each person. It would be very easy to extract the census data from the people and create an XML with headers, though. In fact we have a report that does that, except goes to text rather than XML.

For more information see:

http://gramps-project.org/wiki/index.php?title=Census_Addons
http://gramps-addons.svn.sourceforge.net/viewvc/gramps-addons/trunk/contrib/Census/
http://gramps-addons.svn.sourceforge.net/viewvc/gramps-addons/trunk/contrib/Census/census.xml?revision=461&view=markup

Hope that helps,
-Doug
brichcja 2010-12-02T06:51:06-08:00
Thank you. I'll have a look and see if I can make any sense of it.

Chris
mstransky 2010-12-16T09:53:19-08:00
SFT.xml Model
What is the consensus for tracking gedcom FAM. I have heard people for it and against it.

Currently I have two models that work like so

1.) PID which captures the individuals FatherID and MotherID. anytime a person is picked the Father and Mother ID act like the filter for.
Any other individuals in the PID with matching parents are displayed as siblings along with the selected individual.
This can also match the parents and display any other children linked to parents outside of this family structure aswell.
This uses no GEDCOM fam records

2.) PID which does not capture the mother and father ID, a second db is used FID.xml The FID captures the Male and Female in a family bond ID.
The PID line set has a Child of ID = FID record set. Once an individual is selected the cid is captured and matches the FID, the FID familyID is display pulling Father display and mother display and any matching individual children to this group as siblings.
Also each parent ID pulls up matches of other marriages or BONDS between two individuals to display those not married, or conceived children out side of a marital bond. this uses GEDCOM FAM record but only needs the male and female ID no Child id is need because the FID acts as a match to the childID in the PID.xml

I prefer #2) this also allows a researcher or end user to create a place-marker on a bond between two people for start and end, even if they had no children. This input fields are preferred input fields that are desired but are not RECORDS. An EID.xml will capture the marriage record and link to this place-marker file. also it benefits

Gedcom is the closest like #2, but has better flexiblity in matches and less dual linking through out a gedcom databasee.
AdrianB38 2010-12-16T14:31:32-08:00
My feeling is that the consensus drifted around the family entity being just a sub-type of GROUP. GROUP is needed (by some people) for all sorts of things like business partnerships and can also serve as the entity type for families. Presumably with some sort of sub-type to identify them.

Alongside this, my view of the consensus is that things like birth / adoption events involved the child and parents / step / adoptive parents, each with a defined role in the event and this gave you the exact relationship, instead of relying on this woolly "Child in a Family" concept which ignored that one could be a step-child of one parent and a biological child of the other.

BUT - since these were my personal preferences, I could be misinterpreting the consensus.
hrworth 2010-12-16T15:13:36-08:00
Mike,

My question would be "how do you handle relationships"?

What is your model's definition of a Family?

If you have a "family of 3", Mother, Father, Child, how are these handled beyond the PID?

Russ
mstransky 2010-12-16T17:06:07-08:00
@ Adrian,

you bring up a very good point about the old gedcom indi and fam area was mainly for generic default displays for navigation and placeholders of just a handful of records.
Over the years many users were forced to reuse those areas for other purposes like, adoption, business, bonds between to people, and step children. Many created sub classses inside this area making more complex than it has to be.

My FID (parents of a family) and PID (person individuals) functions the same way INDI and FAM records. BUT
PID holds no records
FAM only stors bonds between two individuals, no children listed.

here is a basic FAM.xml for just cause say the node 'F' for father 'M' for mother, other data like bond begin and bond end with places.
This is data selected by the user as default and is not a record but a place-marker for quick view. say 'g' fr group or family or bond what ever people view it as. displayed is seven groups.

<FIDxml>
<data>
<set><g>1</g><f>123</f><m>24</m><<(all the users generic display data)>></set>
<set><g>2</g><f>123</f><m>22</m><<(all the users generic display data)>></set>
<set><g>3</g><f>124</f><m>87</m><<(all the users generic display data)>></set>
<set><g>4</g><f>67</f><m>54</m><<(all the users generic display data)>></set>
<set><g>5</g><f>87</f><m>102</m><<(all the users generic display data)>></set>
<set><g>6</g><f>87</f><m>36</m><<(all the users generic display data)>></set>
<set><g>7</g><f>2</f><m>95</m><<(all the users generic display data)>></set>
<set>many many more entries here</set>
.........................
</data>
</FIDxml>

Now in the PID
<PIDxml>
<data>
<set><p>67</p><c>2</c><<(all the users generic display data)>></set>
<set>many many more entries here</set>
.........................
</data>
</PIDxml>

Ok Person 67 is a child of (family/group#2) from here you can navigate to thier mother#22 and father#123)

If you choose to select person#123, you will note he has a bond to other people women #22 and #24.

That is for navigating upwards throw records.
Downwards. Say the man#123 was choosen
the FAMxml pulls up two matches*

display*[G#1 matched with women #24]
filterPID match any for g=1
display*[G#2 matched with women #22]
filterPID match any for g=1

So whie looking at a family of a person, the siblings listed can display the nuclear family of 4 matching siblings AND a step child from a bond by marriage or blood from either spouse.

However adrain brings up a good point. adopted children with unknown genetic parent place-markers.
Note that the same way PID loops to the FID up and down via navigation. Captures all children and step children.

With Adoption EID.xml works the same way.
EID Marriage event can tag back to a PID as HUSB also the WIFE roll of the same event links back to the female. Same goes for a legal record Adoption.

Since these are not records but defualt user or research preferred display markers. The real thing is the EID.xml where you enter various class and type records. each eid line points back to PID's linking adoptions to a group, marriages between two person and witness to the same event. I will get into that shortly. I what to show the FID and PID grouping first and show the navigation does work. I did a functional test on my site and can display pedigree, household, and desendants very easily.
mstransky 2010-12-16T17:08:18-08:00
error correction
display*[G#2 matched with women #22]
filterPID match any for g=2

#22 was to be g=2 not 1.
mstransky 2010-12-16T17:16:41-08:00
Here for example my sister had two childern two seprate fathers
She is the pid in question # 161
http://www.stranskyfamilytree.net/gen%20project/view-family/family.asp?pid=161

Both children were put up for adoption and I have a way to do that. Adrain I know what you mean, this was one of my pet peeves also.

The EID.xml that captures the event "Adoption"
Each person plays a roll, birth mother, birth father if known. The child , the adopting parents each. One source record in the SID.
Each EID entry for rolls and data from source doc point to EACH of the corrosponding PID's

that is how you link many people to one event. flag eid lines sets as confirmed or disputed, or to review later. This is how you import export records individually, not mash them inside the INDI, pid have no recordds in them, PIDs are just visaul placemarkers for navigation, and a collected of records tied to a group piont per person.

LOL I am babbling again!!!