Home> GEDCOM X
GEDCOM X is being designed to be an open standard for genealogical data communications.
The following information is taken from the
GEDCOM X About page:
A Long Time Coming
A lot of things have evolved with genealogical technology since the original GEDCOM format was specified. Genealogical applications aren't just about making conclusions anymore. Indeed, a much more sound research philosophy focuses more on records and evidence than it does on making conclusions. Furthermore, with the advent of powerful search engines, software as a service (SaaS) offerings, and social networking applications, the legacy GEDCOM model just isn't going to cut it anymore.
Up through 2010, FamilySearch had been busy with other important things, like online access to their huge collection of records. But around 2010 a lot of notable events--including the sprouting of some impressive standardization efforts--came together to raise the priority of a new GEDCOM. By the end of
RootsTech 2011, it became clear that the community needed something new...
Project Scope
GEDOCM X has a much broader scope than did legacy GEDCOM. The scope of legacy GEDCOM was primarily limited to allowing users to make conclusions about genealogical information, and provided only superficial support for citing evidence and sources. Legacy GEDCOM was primarily designed to be saved as a file to the hard drive of an isolated desktop computer and never considered the needs of other data providers, like an online web application.
GEDCOM X is designed to continue to meet the requirements accounted by legacy GEDCOM. This means GEDCOM X will provide support for genealogical conclusion data and a file format. But GEDCOM X expands on the original scope of legacy GEDCOM to include concepts such as:
- extracted record data
- sound sources and citations
- search results
- images, audio, and video
- research logs
- attribution
- long-term persistent identifiers
- genealogical metadata
- genealogical semantic markup
- etc.
The scope of GEDCOM X also includes support for standardization of the
APIs that can be used to work with genealogically relevant data. These interfaces are based on
the same principles that made the World Wide Web a success and provide a industry-standard way to do things like:
- search for data in a repository
- modify conclusions in an online pedigree
- link to online records
- supply genealogical metadata for an online artifact
- etc.
Models and Profiles
GEDCOM X is neatly partitioned in such a way so as to allow developers to easily use the pieces they need without having to swallow the entirety of the specification. The data is divided into different
Data Models that define the genealogical data types and their properties.
But the GEDCOM X specification defines not only the
data models used to describe genealogical data, but it also defines a set of APIs that describe standard operations on genealogical resources. The API specifications are divided into different
Application Profiles that are intended to address specific sets of well-defined requirements and use cases.
To read about the different GEDCOM X data models,
see the data model documentation.
To read about the application profiles,
try starting with the developer's guide.
Here for the Long Term
GEDCOM X is designed for the long-term. Through
solid design principlesand active
community support, GEDCOM X is the standard mechanism to establish a rich and collaborative environment for the noble work of genealogical research.
Links to GEDCOM X pages
gedcomx.org - the project page
gedcomx.net - the community page
github.com/FamilySearch/gedcomx - the project repository
familysearch.github.com/gedcomx/atom.xml - the project blog (atom feed)
github.com/FamilySearch/gedcomx/wiki - the project wiki
github.com/FamilySearch/gedcomx/issues - the project issue tracker
Links to Articles about GEDCOM X
Glimpses of GEDCOM X - Randy Seaver, 2012 02 07
Ryan Heaton: A New GEDCOM - the Ancestry Insider, 2012 02 04
FamilySearch releases GEDCOM X - Tamura Jones, 2012 02 02
GEDCOM X - Tamura Jones, 2011 12 12
These rely on the W3C encoding for dates and times, which is turn relies on specific locale-neutral date forms from the ISO 8601 standard. See http://www.w3.org/TR/NOTE-datetime.
The problem for genealogical usage is that many registrations occur on a quarterly basis. In order to represent such registration dates accurately the notation must be capable of addressing yearly quarters too. This is a glaring omission in ISO 8601 and directly affects all genealogical and family-history usage.
STEMMA was aware of this shortcoming and introduced the date form yyyy-Qd (e.g. 1956-Q2) which is both compatible-with and entirely in-keeping-with the existing ISO date standard. See http://www.parallaxview.co/familyhistorydata/home/document-structure/event/dates.
FHISO are currently talking to ISO/TC 154 about incorporating this format in a revision of the ISO 8601 standard.
Tony
In general, there is no algorithmic way of converting dates between all the calendars used worldwide now and in the past. This means the stored computer-readable date has to be stored in its original calendar.
What is needed - in addition to a calendar-name property - is an equivalent to ISO 8601 for each calendar. That must provide locale-neutral fields (i.e. numeric only) in sorting order. This is what the W3C (& STEMMA) representations use for their Gregorian dates.
Unfortunately, this requirement is not of mainstream interest and so I'm not aware of any such standard. I am hoping FHISO can do something here in conjunction with ISO, when it gains a bit more weight that is.
Just for clarification: GEDCOM-X cannot "fix" this issue. It could work around the deficiency, as did STEMMA, but that's not the right move. FHISO are trying to achieve a revision to ISO 8601 which would then percolate down to the W3C date/time representations.
Tony
Tony,
At Ryan's presentations at RootsTech, he did specifically state that any concerns and issues regarding GEDCOM X should be placed on the project issue tracker at github:
https://github.com/FamilySearch/gedcomx/issues
He says that's where issues re GEDCOM X will be addressed and discussed.
By the way, at Geir's request, I've created a page here on the BetterGEDCOM wiki with introductory info about GEDCOM X. You'll find it under "Data Models" at:
http://bettergedcom.wikispaces.com/GEDCOMX
Louis
Louis
Have either of you read some of the US Library of Congress work on extended date/time standards.
Here's a link to the "Extended Date/Time Format" standards webpage. You can link to the Draft Specification from there.
http://www.loc.gov/standards/datetime/
IMHO, it's a pretty horrible specification that mixes-up a whole bunch of different issues such as ranges, precision, uncertainty.
It makes no reference to alternative calendars and so is not relevant to the exchange between myself and Adrian.
The most interesting part was years > 4 digits but I already had a planned (2nd) proposal for ISO to incorporate this as a calendar extension of the 8601 standard.
Tony
And we can consider the DeadEnds approach to dates, via the link to its description on the DeadEnds model page. Or with direct link:
http://bartonstreet.com/deadends/DateFormats.pdf
This approach has one wonderful attribute beyond that of being fully computer processable -- it is also human readable and writeable because, gosh darn it, it's human understandable, and, glory be, it's exactly what you'd write naturally anyway.
Of course, because it must, it also handles the uncertainties, the ranges, the downright awkwardnesses that can be present in genealogical dates.
I once again strongly question the wisdom of applying strict, restricted, standards to what is a sloppy, humanistic data domain. I've been blowing this horn for decades, and slowly loosing the battle to the young geeks. Odd to say as I am among the geekier of the geeks. But I'm a wise old geek.
My initial thought was that having original and formal values would seem a sensible way of combining your view (with which I am much in sympathy) and Tony's - the original could contain the sloppy human text and the formal would contain a "proper" date _where_possible_ (e.g. "a year after the Norman Conquest" does NOT equate to 1067 in my book - 1067 +/- 1y would be closer to the truth). My suspicion is that in many cases, such a conversation will simply not be possible.
I have no idea whether this failure to map causes the GEDCOMX people a concern or not. Anyone any idea what their expectation is????
"Tony's view" (for what it's worth) is that both are required. STEMMA has a DATA_ATTRIBUTE called 'Original' for exactly this purpose.
Tony
I don't have any insight on the GEDCOMX date format. I've followed the loops in the documents also. It seems a Date is a Field, a Date is also composed of DateParts, and DateParts are also Fields. That seems to be the extent of information now available. I would expect that there will be a format forthcoming that defines the legal values that the contents of those fields may have. It may even exist and we may just be unable to find it yet!
I agree with you and Tony that both an original form and a processed form may be necessary in some situations. I don't go as far down the road as Tony does in proscribing the format of the processed forms to strict standards. I must admit to thinking of the "processed" form more as a sorting key than as an actual date. That is, I tend to add the processed form to a record in my database only when the original form is not adequate for my (fairly sophisticated) date parser to be able to figure it well enough to use is as a sort key.
Tom
TFP (The Family Pack - my program and database effort http://thefamilypack.org ) stores dates as a single integer, with a range to indicate uncertainly. This has a number of advantages, it simplifies sorting and calculations, it is culturally neutral and it can be displayed in any format you like (provided you have an algorithm for it).
Re Tony's comment on converting dates between all the calendars used worldwide now and in the past. I don't believe the picture is as bleak as you paint it - provided you stick to the period of time the calendar was in official use then the available algorithms are pretty good and AFAIK are reversible, meaning you can convert back and forth between different systems with no loss of data.
Probably the most difficult aspect of historical dates is the variable year change. For instance, sometime after the twelfth century England began to use the 25th of March as the start of the new year, this continued until 1752 and the change to the Gregorian calendar. In Scotland however, the new year has always been the 1st of January (the Scottish celebration of hogmanay goes back a long way). Other European countries have had different year starts. The TFP solution to this is to have Local Calendars which can account for both the local year change and the change over from Julian to Gregorian calendars. This gives us the English Calendar and Scottish Calendar etc as options for describing a given date.
The danger in this approach is that it can lead to a false sense of precision, but I think it important to be clear about what is being shown on a particular document, even if we later go on to round things off in our conclusions.
Nick
Re: Coversions, I'm talking about the general case. Usually, the issues arise between Christian and non-Christian calendars.
Here's one such argument that was presented to me on a genealogy forum:-
"...even today the Hindu calendar is not merely an ancient calendar, it's in common usage -- Government of
India documents carry three dates : the Christian date, and two disparate Hindu dates. For the past 50-few years, that's provided at least an officially correspondant dating. Prior to that ... unreliable for extrapolation or even interpolation".
Otherwise, I entirely agree that the "stored date" (as opposed to the original textual version) must have an associated calendar property.
Tony
It's difficult to see where these possible small differences would actually become a problem. Provided you document what's going on, it's probably the best we can do.
Nick
Can you confirm what you would expect this to achieve? Is it a matter of gaining support for the proposed change?
Tony
- what about dates represented originally in non-UTC calendars (e.g. French Revolutionary dates, English regnal years, lots of other non-Western calendars)
- what about Julian dates? The date 8 February 1750 is ambiguous. If it's a date copied exactly from a source in England, then it should be represented as 8 February 1750/51 or 8 February 1750 OS or 8 February 1751 NS to resolve the ambiguity. (If it's an interpretation of such a date with none of that stuff, then it truly is ambiguous). BUT it's still a Julian date and would not map to the Gregorian(?) date 1751 02 08 UTC.
If it's a date copied exactly from a source in France then I _think_ it's not ambiguous because by that time France had moved onto a New Year starting 1 Jan and the Gregorian calendar but dates centuries earlier in France would have the same issue.
I suspect the first issue can be resolved by switching to UTC for the formal representation but needing to record the textual version in the original format.
The 2nd baffles me somewhat because I've no idea what UTC means taken back that far in history to when Julian dates could be found AND when the New Year was not 1st Jan (which is NOT the same issue - just normally linked).
Tell me if I'm jumping the gun here, but Tony's point rung bells.
Adrian