What are we trying to accomplish?
There are many possible objectives which may be advisable, desirable or even admirable. They may not easily fall into the current project, however, due to technical reasons; because they have more to do with defining core genealogical methodologies; or for other reasons. Below are discussions of goals and reasons why they may or may not be a good fit into the same project.
Issues > Needs and Goals (Under Development)
GOALS
Based upon information provided on this page and others, we will attempt to define the needs and establish a set of goals for the BetterGEDCOM project.
- BetterGEDCOM will be a file format for archiving and exchange of genealogical data. [Developers Mtg 3 Jan 2011 status (approved)] Text moved to BetterGEDCOM Requirements Catalogue
- BetterGEDCOM should have the following encoding and syntax characteristics [First BetterGEDCOM Developers Meeting status (tabled)] [New discussion topic started]: Text moved to BetterGEDCOM Requirements Catalogue
- BetterGEDCOM should define data relating to the study of genealogy. The definitions will describe the XML-based syntax and also be embodied in a data model. The definitions will be capable of extension by software companies and users. The coverage of the types of genealogical data will allow faithful import of data from all current, common genealogical software with no material manual intervention, subject to the limits of the applications involved.[being moved to discussion] [First BetterGEDCOM Developers Meeting status (tabled)] [New discussion topic started] Text moved to BetterGEDCOM Requirements Catalogue
- The BetterGEDCOM project should provide a test suite of data that will allow software suppliers and users to assess compliance of software, diagnose issues and assist in their resolution. [First BetterGEDCOM Developers Meeting status (tabled)] [New discussion topic started] Text moved to BetterGEDCOM Requirements Catalogue
- BetterGEDCOM will support recording of information about real life without bias toward any specific belief system. [First BetterGEDCOM Developers Meeting] Text moved to BetterGEDCOM Requirements Catalogue
- BetterGEDCOM will actively encourage the best practices of scholarly genealogy.
- BetterGEDCOM should define just one way of doing one thing. More than one will cause ambiguity and extra programming for programmers who will now have to handle all methods. BetterGEDCOMs definitions should be general enough to handle all cases, but in just one way. Text moved to BetterGEDCOM Requirements Catalogue
Discussions About Current Goals
Goal 1: BetterGEDCOM should remain a file format technology capable of serving as a data archival repository
BetterGEDCOM has as its initial, immediate goal to serve as a file format in which users can store their genealogical data independent of any software application for storage, safekeeping or transport. The data as stored in BetterGEDCOM format can be placed on some sort of media such as a flash drive, DVD or whatever else of similar nature. BetterGEDCOM does not seek to be a permanent, efficient data store or a data store that replaces any data storage scheme that any software developer uses. The ability to thusly store personal genealogical information is a characteristic of the old GEDCOM standard, and we absolutely agree that this is a feature that should be retained for users.
Goal 2.1: BetterGEDCOM should use an XML-based syntax
Goal 2.2: BetterGEDCOM should use Unicode character set in UTF-8 encoding, and optionaly support other encoding schemes of Unicode
Goal 2.3: BetterGEDCOM should utilize a standardized container specification to hold separate supporting files such as multimedia
Questions for this goal: What container specification should be used? What compression method? Should any file type be allowed? Should BetterGEDCOM have any role regarding file formats or should this be entirely left to software developers?
A new Page called Multimedia File Inclusion Issues has been added to discuss this topic in more detail.
Goal 2.4: BetterGEDCOM should support a markup language such as HTML.
This will allow data to be formatted as desired by the user when displayed. It will allow html reference (href) links, lists, tables, and most other HTML constructs to be included and displayed as a web browser would display them. Programs may choose to ignore some or all the HTML markup and display the data as plain text until the authors have the time to build in a rich display capability. Since you cannot include “<” and “>” characters in text data sections of XML, the standard method of encoding those two HTML characters in XML will be used: "<" will be encoded as "<" and ">" will be encoded as ">".
Goal 2.5: Lines should have no length restriction.
There will be no need for anything like the continuation (CONT) tag or the misinterpreted concatenation (CONC) tag that GEDCOM has because of GEDCOM's maximum line length.
Goal 4: Test suite of data.
This would include GEDCOM and BetterGEDCOM files for testing:
(A) Proper Translation of GEDCOM to BetterGEDCOM via:
(1) Input GEDCOM, (2) Save to Database, (3) Retrieve from Database, (4) Export to BetterGEDCOM
For this test, the input file will be provided. The exported file is to be compared to a BetterGEDCOM file that will be provided.
(B) Proper Understanding and Flowthrough of all BetterGEDCOM information via:
(1) Input BetterGEDCOM, (2) Save to Database, (3) Retrieve from Database, (4) Export to BetterGEDCOM
For this test, the input file will be provided. The exported file is to be compared to the input file.
What The Original GEDCOM Did
GEDCOM, which is an acronym for
GEnealogical
Data
COMmunications, is a bit of a misnomer by today's terminology, because in fact GEDCOM would not really be considered a way to communicate data by most technology professionals. GEDCOM defined a way to format genealogical data and write it to a simple text file, editable by anyone (i.e., the data was not in a binary format or any other machine format that would be difficult to understand). This file could then be physically transported by any standard means (e.g., placed on a disk or any other computer media, attached to an email, etc.), stored, and later imported into a computer program to be used and manipulated.
Some things GEDCOM never sought to accomplish, it is important to note, are:
- GEDCOM never sought to define how information could be directly passed between two computer systems. Information in GEDCOM format had to be imported into and exported out of computer programs. This is an important point, because computer systems already have standards for passing information between themselves, and several aspects of the GEDCOM format do not accommodate actual live data transmission (and were not meant to).
- GEDCOM did not seek to define anything about how genealogical information or practice itself should be standardized. There are many genealogical standards organizations and bodies which deal with issues related to genealogical methodology. Likewise, there are many large, established genealogical software products and services available which make certain determinations about how to format or relate genealogical information. GEDCOM did not seek to influence either genealogical standards bodies or software companies but rather sought to be a neutral middle-ground format (a genealogical Switzerland, if you will) without seeking to push any party to a particular way of working or formatting genealogical data that was currently being created or worked on.
Thus GEDCOM, as conceived, was a portable storage file format. This aspect of GEDCOM, i.e., that it was conceived as a standardized way to archive as well as transport/transfer data, is an important feature worth retaining.
So, what's wrong with GEDCOM, anyway? Use this page to tell your horror stories, explain what data you want transferred, and even explain what features you want in your genealogy software that aren't there today.
Patch vs. Replace?
Do we seek to restrict this project to addressing the data issues directly related to shortcomings in the GEDCOM standard and other easier fixes or should we tackle the issue of
evidence and conclusions?
Should this current effort focus solely on a conclusion-based model, matching what exists in basically every software application today? Should we try to embrace evidence as a fundamental data entity? If embracing evidence, how is it possible to also fully and seamlessly support conclusions in the model?
What Do Genealogists Need?
What immediate needs does the genealogical community have?
- A widely recognized standard for data interchange
- Independent from any one organization, software program or service
- Suitable for long-term use as a standalone archival data store
- Ability to export ALL factual, relational, multimedia, deliberative and notational data from software applications
- Ability to note, mitigate, and resolve conflicts, ambiguities related to data format differences during the import process and before final import
What Do Genealogists Want?
What goals should we set for genealogical technology of the future?
- Direct communication between software programs or services for purposes of reference (Federation), updating, synchronizing of data
Some
personal notes on Family History practices in the UK and how they create goals.
What Do Genealogical Standards Require?
What genealogical standards exist which should be met and impact the goals of BetterGEDCOM?
What Do Genealogical Organizations Want/Need?
What needs to particular genealogical organizations have which should be accommodated?
- Specific data requirements for religious organizations
- Specific symbols widely or traditionally used
- Desired data standards not accommodated in other ways
What Do Software Developers Want/Need?
What technical considerations of software developers should be considered?
- Flexibility
- Commercial realities
- Precision in the syntax and the language of the standard
What Other Goals May Be Desirable That Are Not Otherwise Accommodated?
Are there goals or constituencies that could be considered which are not included in the above?
- Scientific needs and uses for other social sciences disciplines such as genetics, anthropology, sociology, etc.
"4.BG should require a software application to have a robust conflict resolution facility prior to final import to be compliant"
Do you think that is is reasonable, or even realistic to try to tell software developers what to do with data? After all, we're defining a standard for a data file. What the developer does with that file is up to him, or her.
Plus, now we'd have to define what a conflict is, what a resolution is and what robust means. Here we are really getting into the functionality of the genealogy database program and not the description of the dat file created for exchange of data.
The process by which ISO-recognized standardization is pursued will have a direct effect on the sort of standards compliance that is even possible.
As points like this are brought up, the goals will change by other members of the wiki, or one of the moderators (of which I am one) will change the main pages to reflect the discussions that go hon here on the discussion pages.
This wiki is sort of like making soup, however, so changes may not happen right this instant. The soup need a little time to cook. WE've only put the ingredients in the pot.
Several standards have statements about what to do with unsupported data. And several programs available today have functionality that could be considered "conflict resolution" functionality, e.g. mapping of event types.
A standard may also suggest how a program should behave.
I think one should have this aspect in mind, but it may be too early to discuss. We need some concrete problems on the table first.
I have listed the four most likely candidates (in my opinion) through which we could pursue some sort of standards process. (Note that IETF is not part of ISO.) Personally, I think AIIM is the way to go. Be that as it may, I strongly encourage folks to become familiar with these different bodies, how they work, what sorts of standards they have developed, etc. These are very well-defined bureaucratic processes, and the avenue we end up taking will very much define what sort of adherence mechanisms and guidelines we can even think about putting in place.
I agree with you, but as a User, I should be presented with some sort of information that when I receive information that was transported in a BetterGEDCOM format and it was NOT included in the presentation in what I am looking at, I should be Notified as to what was dropped or not in the correct format.
This is not a new 'requirement' its an enhancement to what may happen in some software packages when a GEDCOM file is imported. All I know now is that something was dropped, with very little detail, but this request is to enhance what is not extracted from the BetterGEDCOM file.
On User's opinion.
Thank you,
Russ