The de facto GEDCOM standard actually comes in several versions, and all the standard versions are in fact backward compatible. Given the backward compatible nature of the previous versions, it seems unproductive to go into the minutia of older versions. Here the commonly recognized versions, which are version 5.5 and, to a lesser degree, version 5.5.1 are discussed. Also draft versions such as the XML-based version 6.0 draft and the GEDCOM (Future Directions) draft version will be discussed.
Anyone with knowledge of previous versions that feels discussion of their development or other considerations are of value, please add your thoughts and information.
Much of the data provided here is from the
FamilySearch Developer Network for Software Programmers GEDCOM Wiki Page. Given that they are the original developer of this standard, I highly recommend study of their resources.
GEDCOM Specifications
GEDCOM versions (that have been used by genealogy software):
- GEDCOM 5.5 of 2 January 1996: (PDF Version[official document]) (HTML Version[Not official but seems complete. Provided for convenience.])- Most widely accepted specification of GEDCOM.
- GEDCOM 5.5.1 DRAFTof 2 October 1999 - This specification introducing nine new tags, including WWW, EMAIL and FACT, and added UTF-8 as an approved character encoding. This draft has not been formally approved, but its provisions have been adopted in some part by a number of genealogy programs and is used by FamilySearch.org. It is the de-facto standard.
- GEDCOM 5.4 DRAFT of 21 August 1995 was used by a few programs.
- GEDCOM 5.3 DRAFT of 4 November 1993 was used. Many GEDCOMs still exist that were created using this draft.
XML Versions (never used)
- GEDCOM 5.6 DRAFT of 18 December 2000, never released to the public, includes GEDXML format. Tamara Jones' Analysis of GEDCOM 5.6
- GEDCOM XML 6.0 DRAFT - of 28 December 2001. Included is a cover letter from 23 January 2002, stating that this draft(beta) version of GEDCOM 6.0 was released for developers to study only as it was not a complete specification and recommended not to begin to implement in their software. For example, descriptions of the meaning and expected contents of tags were not included. GEDCOM 6.0 was to be the first released version to store data in XML format, and was to change the preferred character set from ANSEL to Unicode. There is an earlier version of the same draftfrom 2 October 2000.
- GEDCOM "Future Direction" Documentof 7 July 1999 was a document presented for the purpose of fostering discussion toward the future direction of genealogical data communication.
- GEDCOM XML Letter: "Will GEDCOM Be Replaced By XML?(date unknown but folloed the 6.0 DRAFT) includes the LDS work to move towards XML and includes a clear and interesting GEDXML Data Model diagram
Other:
GEDCOM Global Unique Identifier (GUID) of 8 June 2007
Use of ANSEL vs. UTF-8 Character Set
The GEDCOM 5.5 specification uses the ANSEL character set, which is obsolete. The intermittently implemented 5.5.1 standard uses the UTF-8 character set that is now nearly universally used by computers. This issue must be resolved in any new standard. Use of UTF-8 character set should clearly be the new standard.
Use of GEDCOM 5.x File Syntax vs. XML Language Syntax
The file syntax of all used GEDCOM versions (i.e., 5.5.1 and below) is the same, and all those versions are consequently backward compatible (except for some largely inconsequential discontinued attributes). This file syntax is unique to GEDCOM, and it contains weaknesses that would make its future use ill advised. Moreover, the GEDCOM syntax is based on a hierarchical data model just as the XML language is, making the adoption of XML something that can be easily understood and accomplished by developers. XML is a widely used standard language, and thus adopting XML for widespread genealogical use could easily result in many more applications being available to the genealogist as developers discover how easily their products can be adapted for genealogical use.
The downside to using XML is that BetterGEDCOM (BG) immediately ceases to be backward compatible. Tools could easily be developed to convert old GEDCOM files to the new BG format, but the two formats would be irrevocably separate. This is probably inevitable, but it is important to note this future incompatibility prominently and explicitly.
(The following was cut from a separate page on this WIKI and that page is now unlinked. The contents are better placed here. Original was by Greg Lambertson. Feel free to edit).
GEDCOM 5.5 Discussion And Detail
Notes from reading through the GEDCOM 5.5 Standard
Cover Letter
"Needed changes that would cause major compatibilities with prior implementations were postponed until release 6.0, some time in the future. Minor 5.x releases may occur in the interim." This tells me that even by the release of the 5.5 standard, FamilySearch recognized the existing GEDCOM standard's syntax caused limitations that could not be fixed within its framework.
Introduction
The specification describes the standard at a low level in Chapter 1 and a high level in Chapter 2. The lower level is called the
GEDCOM data format, and the high level is called a GEDCOM form (there can be many
GEDCOM forms). The GEDCOM form for the purposes of the standard is specified as the
Lineage-Linked GEDCOM Form.
This table should be useful in seeing how these sections correspond with our future efforts.
|
GEDCOM 5.5
|
BetterGEDCOM (BG)
|
Data Syntax (low level)
|
Chapter 1: GEDCOM Data Format
|
XML syntax
|
Data Model (Genealogical Structure of data)
|
Chapter 2: Lineage-Linked GEDCOM Form
|
XML BetterGEDCOM namespace
|
As illustrated by this table, BetterGEDCOM (BG) will benefit from using the existing XML data format by not having to define a custom low-level data syntax.
Chapter 1
Chapter 2