This is my attempt to set down an overall goal for Better GEDCOM and a high level statement of Better GEDCOM's requirements -- Tom Wetmore
The goal of the Better GEDCOM project is to design a file format that can be used to both archive comprehensive genealogical data, and to transfer genealogical data between persons, websites, and applications.
1. The syntax of the Better GEDCOM files shall be a non-proprietary format (e.g., XML, GEDCOM, JSON, …, or a custom application specific format).
2. The data model that underlies Better GEDCOM must be a superset of the models used by existing genealogical applications to the fullest extent deemed possible during design.
3. The data model that underlies Better GEDCOM must provide a set of data entities that will allow genealogical applications to support all conventional genealogical processes.
4. The character set used by Better GEDCOM files must be UTF-8 encoded Unicode.
5. Better GEDCOM files may contain references to external information that may exist as URI's in cyber space or in container files that accompany the Better GEDCOM files.
6. Better GEDCOM must not impose restrictions on field lengths or value formats except as deemed necessary during design.
7. Better GEDCOM must provide a means to mark-up text that is used in contexts that allow unstructured text (e.g., notes).
Notes on Requirements.
1. The final syntax of Better GEDCOM files is not specified by these requirements. It will probably be XML, but at this point in the process it’s not an important issue.
2. If Better
GEDCOM does not fully encompass the data models used by existing applications, significant data from those applications will be lost during exports to Better GEDCOM files. If this happens Better GEDCOM will not be relevant.
3. Not only must Better GEDCOM support the data models of existing applications, it must also provide the data model for future applications that will fully support genealogical research processes.
4. Unicode is the universally accepted solution for handling the multitude of modern, historical and ancient character sets used by all human cultures. UTF-8 is the most common byte encoding of Unicode and supported by all modern software development environments.
5. Better GEDCOM must handle multi-media information. That information takes the form of resources on the web or as files in locally accessible file systems. When data from a genealogical application is exported to Better GEDCOM files, the files may contain references to web resources or references to files that are simultaneously exported to an accompanying container file. In fact it would be be best for a Better GEDCOM file to be a zip file that contains both the Better GEDCOM text and the accompanying multi-meidia files; this guarantees the synchronicity between the two elements. Applications may give their users the option to include or exclude local multi-media files when exporting data.
6. Better GEDCOM must be as flexible as possible. Formats for dates, places, names, must be as free, open and unrestricted as possible. Fixed formats should be eliminated as much as possible.
7. There are two possible mark-up topics concerning Better GEDCOM. The first is the semantic marking up of content. The Better GEDCOM data model will handle the overall semantic mark up scheme in the set of the tags and elements it defines. The second topic is the one addressed by this requirement -- it is the stylistic mark up of text for appearance in reports and outputs. This mark-up can be accomplished using HTML or RTF type tagging in the fields that hold unstructured text.