This is my attempt to set down an overall goal for Better GEDCOM and a high level statement of Better GEDCOM's requirements -- Tom Wetmore
The goal of the Better GEDCOM project is to design a file format that can be used to both archive comprehensive genealogical data, and to transfer genealogical data between persons, websites, and applications.
1. The syntax of the Better GEDCOM files shall be a non-proprietary format (e.g., XML, GEDCOM, JSON, …, or a custom application specific format).
2. The data model that underlies Better GEDCOM must be a superset of the models used by existing genealogical applications to the fullest extent deemed possible during design.
3. The data model that underlies Better GEDCOM must provide a set of data entities that will allow genealogical applications to support all conventional genealogical processes.
4. The character set used by Better GEDCOM files must be UTF-8 encoded Unicode.
5. Better GEDCOM files may contain references to external information that may exist as URI's in cyber space or in container files that accompany the Better GEDCOM files.
6. Better GEDCOM must not impose restrictions on field lengths or value formats except as deemed necessary during design.
7. Better GEDCOM must provide a means to mark-up text that is used in contexts that allow unstructured text (e.g., notes).
Notes on Requirements.
1. The final syntax of Better GEDCOM files is not specified by these requirements. It will probably be XML, but at this point in the process it’s not an important issue.
2. If Better
GEDCOM does not fully encompass the data models used by existing applications, significant data from those applications will be lost during exports to Better GEDCOM files. If this happens Better GEDCOM will not be relevant.
3. Not only must Better GEDCOM support the data models of existing applications, it must also provide the data model for future applications that will fully support genealogical research processes.
4. Unicode is the universally accepted solution for handling the multitude of modern, historical and ancient character sets used by all human cultures. UTF-8 is the most common byte encoding of Unicode and supported by all modern software development environments.
5. Better GEDCOM must handle multi-media information. That information takes the form of resources on the web or as files in locally accessible file systems. When data from a genealogical application is exported to Better GEDCOM files, the files may contain references to web resources or references to files that are simultaneously exported to an accompanying container file. In fact it would be be best for a Better GEDCOM file to be a zip file that contains both the Better GEDCOM text and the accompanying multi-meidia files; this guarantees the synchronicity between the two elements. Applications may give their users the option to include or exclude local multi-media files when exporting data.
6. Better GEDCOM must be as flexible as possible. Formats for dates, places, names, must be as free, open and unrestricted as possible. Fixed formats should be eliminated as much as possible.
7. There are two possible mark-up topics concerning Better GEDCOM. The first is the semantic marking up of content. The Better GEDCOM data model will handle the overall semantic mark up scheme in the set of the tags and elements it defines. The second topic is the one addressed by this requirement -- it is the stylistic mark up of text for appearance in reports and outputs. This mark-up can be accomplished using HTML or RTF type tagging in the fields that hold unstructured text.
This is entirely up to the application that creates the BetterGEDCOM file, and then at "the other end" in the application that imports a BetterGEDCOM file containing such items.
Example - Reunion in its Notes fields allows for Styled Text which Reunion outputs to various of its printed reports, but when a GEDCOM file is created, all the styling disappears.
But Legacy if I'm not mistaken does allow for the HTML encoding of some styling into the Notes to be written to the GEDCOM file if the user chooses by the "Keep embedded formatted codes within text".
But Reunion of course does allow me to include HTML tags in the notes (they just look ugly in any view that doesn't use a browser to parse those tags), so I can use things like <a href and <b> in Reunion to cause the desired appearance of my data when viewed with a web browser in TNG
for example where the title
A Popular Nonagenarian
is in the Reunion Notes field, and the GEDCOM file as
0 @N409@ NOTE
1 CONT <b>A Popular Nonagenarian</b>
So currently the combination of TNG and the Web Browser does allow for the second part of this - reading the styling out of a GEDCOM file and displaying it as expected, and at least Legacy does allow for the insertion of the text styling by HTML into a GEDCOM file.
To some extent evidence objects will fly in the face of this. If you have a scan you'll need to say whether it's a TIFF, PNG, etc. Even then, you're not mandating how it's to be presented, just what it is.
This has always been my main argument against formatted citation strings as data elements in BG. For example, if BG decides to use XML as its base level syntax, then Elizabeth Shown Mills templates could be implemented using XSLT stylesheets.
However, text markup is not, per se, stylistic. The original purpose of HTML was to specify content, not style. This is the title, this is a paragraph, this is a list, this is an element in a list. This is not stylistic information, it is structural content. There is no mention of font, point size, paragraph indenting here, just structural content.
I am not arguing strongly for text markup in BG here, but simply pointing out that it is not necessarily inconsistent with BG's goals of holding genealogical data.