BetterGedcom - GEDCOM X Framework

This is the GEDCOM X file format, as best as can be extracted from the GEDCOM X site. All the RDF and standardization parts of the GEDCOM X definition have been removed so that the data structures themselves are clearly revealed and can be discussed.

NOTE: This document is far from perfect. It needs the scrutiny and review of the primary GEDCOM X developers to see if it matches the spec. Such a review by them would be highly worthwhile, as this document was very difficult to produce from their specs, which are in places unclear, ambiguous, and incomplete.

GEDCOM X has both Elements (that begin with a lower case letter) and Data Types (that begin with an upper case letter). In some cases the same name (e.g. Person) will be defined as both an element and a data type. Despite efforts of GEDCOM X to explain the difference, it is not clear in their examples and descriptions when to use which and it is confusing. This document will not differentiate and items will be named with their lower case versions. For GEDCOM X's definition of "Types and Elements", see: http://www.gedcomx.org/Developers-Guide.html

GEDCOM X assigns ID's to items, and defined with ID="...". Links to these ID's are through resource="...".

Anything with "..." still needs to be discovered in GEDCOM X and expanded upon.

Please comment on an appropriate discussion page. There is one for general discussion, as well as one for the various model parts, and one for each item of the model. If you want to discuss one item in the model, please make a new page for that.

In March 2012, the purpose of GEDCOM X was stated as:

"To define an open data model and an open serialization format for exchanging the components of the genealogical proof standard."

When they talk about "the components of the genealogical proof standard," they mean these:

Search Reliable Sources
Cite Each Source
Analyze Sources, Information, and Evidence
Resolve Conflicts
Make a Soundly-Reasoned Conclusion\\

See: https://github.com/FamilySearch/gedcomx/issues/156#issuecomment-4848666

On August 14, 2012, the one REQUIREMENT for GEDCOM X was stated as:

"GEDCOM X must be able to accommodate FamilySearch's Platform API."

See: http://familysearch.github.com/gedcomx/2012/08/14/requirements.html

The Conclusion Model
Ref: http://www.gedcomx.org/Conclusion-Model.html
Ref: http://www.gedcomx.org/gxc.html

The Record Model:
Ref: http://record.gedcomx.org/Record-Model.html

Conclusion/Record Distinction:
Ref: http://record.gedcomx.org/Conclusion-Record-Distinction.html
Ref: http://record.gedcomx.org/twomodels-whitepaper.pdf

In GEDCOMX, Conclusions link via the "source" item to the records in the Record Model. The "attribution" item contains the proof statement for conclusions. This document will includes both models.

Note that some items are named the same in the two models but are not defined the same, e.g. name, gender, relationship. To distinguish, in this document, the similar items in the record model will be suffixed by "-record".

Multimedia objects are somehow covered in the "source" item. That's not been expanded in this document.

There are three top-level records:
person - in the conclusion model,
relationship - in the conclusion model.
record - in the record model.

It likely can be argued whether there should be other top-level records that can have their own facts attached, e.g:
place - in the conclusion model
group - in the conclusion model (for a group of people, or an organization)

PERSON :=

<person ID="XREF:PERSON">
- <<IDENTIFIER>> {0:M}
- <<LIVING>> {0:1}
- <<GENDER>> {0:1}
- <<NAME>> {0:M}
- <<FACT>> {0:M}
- <<SOURCE>> {0:M}
- <<NOTE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</person>

Ref: http://www.gedcomx.org/gxc_el_person.html
Ref: http://www.gedcomx.org/gxc_Person.html

RELATIONSHIP :=

<relationship ID="XREF:RELATIONSHIP">
- <type><<RELATIONSHIP_TYPE>></type> {1:1}
- <person1 resource="XREF:PERSON" /> {1:1}
- <person2 resource="XREF:PERSON" /> {1:1}
- <<FACT>> {0:M}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</relationship>

Ref: http://www.gedcomx.org/gxc_el_relationship.html
Ref: http://www.gedcomx.org/gxc_Relationship.html
Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/RelationshipType.html#values()

RELATIONSHIP_TYPE :=

[ Couple | ParentChild | OTHER ]

Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/RelationshipType.html

RECORD :=

<record ID="XREF:RECORD">
- <<PERSISTENT_ID>> {0:1}
- <<ALTERNATE_ID>> {0:M}
- <source-record resource="XREF:RECORD" /> {0:M}
- <type><<RECORD_TYPE>></type> {0:1}
- <<PERSONA>> {0:M}
- <<RELATIONSHIP_RECORD>> {0:M}
- <<FACT_RECORD>> {0:M}
- <<ATTRIBUTION>> {0:1}
</record>

Ref: http://record.gedcomx.org/gxr_el_record.html
Ref: http://record.gedcomx.org/gxr_Record.html
Note: Not sure why there would be attribution here. Attribution is only for conclusions.

RECORD_TYPE :=

[ Bank | Birth | Census | Death | Draft | Land | Legal | Marriage | Migration
| Military | Pension | Probate | Roll | Tax | Vital | OTHER ]

http://www.gedcomx.org/java/apidocs/org/gedcomx/types/RecordType.html

Below are the substructures of GEDCOM X that could be identified.

AGE_PART_TYPE :=

[ Days | Hours | Minutes | Months | Years | OTHER ]

Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/AgePartType.html

AGE_RECORD :=

<age ID="XREF:AGE_RECORD">
- <<AGE_RECORD_PART>> {0:M}
- <original>...</original> {0:1}
- <interpreted>...</interpreted> {0:1}
- <<FORMAL>> {0:1}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</age>

Ref: http://record.gedcomx.org/gxr_Name.html

AGE_RECORD_PART :=

<part ID=XREF:AGE_RECORD_PART>
- <type><<AGE_PART_TYPE>></type> {0:1}
- <original>...</original> {0:1}
- <interpreted>...</interpreted> {0:1}
- <<FORMAL>> {0:1}
- <<SOURCE>> {0:M}
</part>

http://record.gedcomx.org/gxr_Age.html

ALTERNATEID :=

<alternateId> {1:1}
- <type>...</type> {1:1}
- <value>...</value> {1:1}
</alternateId> {1:1}

Ref: http://record.gedcomx.org/gxr_el_record.html
Ref: http://record.gedcomx.org/gxr_Record.html
Ref: http://record.gedcomx.org/gxr_Persona.html

ATTRIBUTION :=

<attribution> {1:1}
- <modified>{datetime}</modified> {0:1}
- <proofStatement>{string}</proofStatement> {0:1}
- <confidence resource="<<CONFIDENCE_LEVEL>>" /> {0:1}
- <contributor resource="..." /> {0:1}
</attribution> {1:1}

http://www.gedcomx.org/gx_Attribution.html
Note: Why would the proof statement appear in the Document model?
Note: Confidence is a subjective part of a conclusion.
Note: Why would the confidence appear in the Document model?
Note: Likely, only modified and contributor should be in the Document model.

CONCLUSION :=

<conclusion ID="...">
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</conclusion>

Ref: http://www.gedcomx.org/gxc_Conclusion.html
Note: Haven't found where this is used in the rest of the grammer yet.

CONFIDENCE_LEVEL :=

[ Certainly | Probably | Possibly | Likely | Apparently | Perhaps | OTHER ]

Ref: http://www.gedcomx.org/gx_confidenceLevel.html
Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/ConfidenceLevel.html
Note: They should probably add the negatives of each, i.e. Certainly not, Probably not, etc.

DATE :=

<date>
- <original>...</original>
- <formal resource="..." datatype="...">{string}</formal>
</date>

Ref: http://www.gedcomx.org/gxc_Date.html

DATE_PART_TYPE :=

[ Days | Months | Years | OTHER ]

Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/DatePartType.html

DATE_RECORD :=

<date ID="XREF:DATE_RECORD">
- <<DATE_RECORD_PART>> {0:M}
- <original>...</original> {0:1}
- <interpreted>...</interpreted> {0:1}
- <<FORMAL>> {0:1}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</date>

Ref: http://record.gedcomx.org/gxr_Date.html

DATE_RECORD_PART :=

<datePart ID="XREF:DATE_RECORD_PART">
- <type><<DATE_PART_TYPE>></type> {0:1}
- <original>...</original> {0:1}
- <interpreted>...</interpreted> {0:1}
- <<FORMAL>> {0:1}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</datePart>

Ref: http://record.gedcomx.org/gxr_DatePart.html

FACT :=

<fact ID="XREF:FACT">
- <type>FACT_TYPE</type> {1:1}
- <<DATE>> {0:1}
- <<PLACE>> {0:1}
- <original>...</original> {0:1}
- <formal>...</formal> {0:1}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</fact>

Ref: http://www.gedcomx.org/gxc_el_fact.html
Ref: http://www.gedcomx.org/gxc_Fact.html
Note: No distinction is made between events and facts. They are all called facts.
Note: That is a good thing!

FACT_RECORD :=

<fact ID="XREF:FACT_RECORD">
- <type>FACT_TYPE</type> {1:1}
- <<DATE_RECORD>> {0:1}
- <<PLACE_RECORD>> {0:1}
- <original>...</original> {0:1}
- <interpreted>...</interpreted> {0:1}
- <formal>...</formal> {0:1}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</fact>

Ref: http://record.gedcomx.org/gxr_Fact.html
Note: No distinction is made between events and facts. They are all called facts.

FACT_TYPE := { FACT_TYPE_PERSON | FACT_TYPE_COUPLE | FACT_TYPE_PARENT_CHILD }

Ref: http://www.gedcomx.org/gx_factType.html
Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/FactType.html

FACT_TYPE_PERSON :=

[ Adoption | AdultChristening | Affiliation | Age | Baptism | BarMitzvah | BatMitzvah | Birth
| Blessing | Burial | CasteName | Census | Christening | Circumcision | Citizenship | ClanName
| Confirmation | CountOfChildren | CountOfMarriages | Cremation | Death | DiedBeforeEight
| Dwelling | Emigration | Ethnicity | Excommunication | FirstCommunion | Flourish | Funeral
| GedcomUUID | Graduation | Illness | Immigration | Interment | Living | MaritalStatus
| MilitaryAward | MilitaryCompany | MilitaryDischarge | MilitaryRank | MilitaryRegiment
| MilitaryService | MilitaryServiceBranch | Mission | Move | NameOfShip | Naturalization
| Namesake | NationalId | NationalOrigin | NeverHadChildren | NeverMarried | NotAccountable
| Occupation | Ordinance | Ordination | PhysicalDescription | PortOfDeparture | PreviousResidence
| Probate | Possessions | Race | RelationshipToHead | ReligiousAffiliation | Residence
| Retirement | ScholasticAchievement | SocialSecurityNumber | Stillborn | TitleOfNobility
| TribeName | Twin | Will ]

Ref:http://www.gedcomx.org/java/apidocs/org/gedcomx/types/FactType.Person.html

FACT_TYPE_COUPLE :=

[ Annulment | CommonLawMarriage | CurrentlySpouses | Divorce | DivorceFiling | Engagement
| Marriage | MarriageBanns | MarriageContract | MarriageIntent | MarriageLicense
| MarriageNotice | MarriageSettlement | NumberOfChildren | Separation | UniversalId ]

Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/FactType.Couple.html

FACT_TYPE_PARENT_CHILD :=

[ Biological | Adopted | Step | Foster | Guardianship ]

Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/FactType.ParentChild.html

FORMAL :=

<formalValue resource="..." datatype="...">...</formalValue>

Ref: http://www.gedcomx.org/gx_FormalValue.html

GENDER :=

<gender>
- <type><<GENDER_TYPE>></type> {0:1}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</gender>

Ref: http://www.gedcomx.org/gxc_el_gender.html
Ref: http://www.gedcomx.org/gxc_Gender.html

GENDER_RECORD :=

<gender ID="XREF:GENDER_RECORD">
- <type><<GENDER_TYPE>></type> {0:1}
- <original>...</original> {0:1}
- <interpreted>...</interpreted> {0:1}
- <<FORMAL>> {0:1}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</gender>

Ref: http://www.gedcomx.org/gxc_el_gender.html
Ref: http://www.gedcomx.org/gxc_Gender.html

GENDER_TYPE :=

[ Female | Male | Unknown | OTHER ]

http://www.gedcomx.org/gx_genderType.html
http://www.gedcomx.org/java/apidocs/org/gedcomx/types/GenderType.html

IDENTIFIER :=

<identifier>
- <type><<IDENTIFIER_TYPE>></type>
- <value>...</value>
</identifier>

Ref: http://www.gedcomx.org/gx_Identifier.html

IDENTIFIER_TYPE :=

[ Primary | Forwarded | OTHER ]

Ref: http://www.gedcomx.org/gx_identifierType.html
Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/IdentifierType.html

LIVING :=

<living>[ true | false ]</living> {1:1}

Ref: http://www.gedcomx.org/gxc_Person.html

NAME :=

<name ID="XREF:NAME">
- <type><<NAME_TYPE>></type> {0:1}
- <preferred>[ true | false ]</preferred> {0:1}
- <<NAME_FORM_PRIMARY>> {0:1}
- <<NAME_FORM_ALTERNATE>> {0:M}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</name>

Ref: http://www.gedcomx.org/gxc_el_name.html
Ref: http://www.gedcomx.org/gxc_Name.html

NAME_FORM_ALTERNATE :=

<alternateForm>
- <fullText>...</fullText> {0:1}
- <<NAME_PART>> {0:M}
</alternateForm>

Ref: http://www.gedcomx.org/gxc_el_name.html
Ref: http://www.gedcomx.org/gxc_NameForm.html

NAME_FORM_PRIMARY :=

<primaryForm>
- <fullText>...</fullText> {0:1}
- <<NAME_PART>> {0:M}
</primaryForm>

Ref: http://www.gedcomx.org/gxc_el_name.html
Ref: http://www.gedcomx.org/gxc_NameForm.html

NAME_PART :=

<part>
- <type><<NAME_PART_TYPE>></type> {0:1}
- <text>...</type> {0:1}
</part>

Ref: http://www.gedcomx.org/gxc_el_name.html

NAME_PART_TYPE :=

[ Given | Prefix | Suffix | Surname | OTHER ]

Ref:http://www.gedcomx.org/gx_namePartType.html
Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/NamePartType.html

NAME_RECORD :=

<name ID="XREF:NAME_RECORD">
- <type><<NAME_RECORD_TYPE>></type> {0:1}
- <preferred>[ true | false ]</preferred> {0:1}
- <<NAME_RECORD_PART>> {0:M}
- <original>...</original> {0:1}
- <interpreted>...</interpreted> {0:1}
- <<FORMAL>> {0:1}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</name>

Ref: http://record.gedcomx.org/gxr_Name.html

NAME_RECORD_PART :=

<part ID=XREF:NAME_RECORD_PART>
- <type>NAME_PART_TYPE</type> {0:1}
- <original>...</original> {0:1}
- <interpreted>...</interpreted> {0:1}
- <<FORMAL>> {0:1}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</part>

http://record.gedcomx.org/gxr_NamePart.html

NAME_TYPE :=

[ Adoptive | AlsoKnownAs | BirthName | DeathName | Formal | MaidenName
| MarriedName | Name | Nickname | Religious | OTHER ]

Ref: http://www.gedcomx.org/gx_nameType.html
Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/NameType.html

NOTE :=

<note ID="XREF:NOTE"> {1:1}
- <text>...</text> {0:1}
- <<ATTRIBUTION>> {0:1}
</note> {1:1}

Ref: http://www.gedcomx.org/gx_el_note.html
Ref: http://www.gedcomx.org/gx_Note.html
Note: In the GEDCOM X Common Model, presumably can be added anywhere.

PERSISTENT_ID :=

<persistentId>...</persistentId>

Ref: http://record.gedcomx.org/gxr_Record.html
Ref: http://record.gedcomx.org/gxr_Persona.html

PERSONA :=

<persona principal="..." ID="XREF:PERSONA">
- <<PERSISTENT_ID>> {0:1}
- <<ALTERNATE_ID>> {0:M}
- <<GENDER_RECORD>> {0:1}
- <<AGE_RECORD>> {0:1}
- <<NAME_RECORD>> {0:M}
- <<FACT_RECORD>> {0:M}
- <<ATTRIBUTION>> {0:1}
</persona>

Ref: http://record.gedcomx.org/gxr_Persona.html
Ref: http://record.gedcomx.org/gxr_Record.html
Note: Not sure why there would be attribution here. Attribution is only for conclusions.

PLACE :=

<place>
- <original>...</original>
- <formal resource="..." datatype="...">{string}</formal>
</place>

Ref: http://www.gedcomx.org/gxc_Place.html

PLACE_PART_TYPE :=

[ Address | Cemetery | Church | City | Country | County | District
| Hospital | Island | MilitaryBase | Mortuary | Parish | PlotNumber
| PostalCode | PostOffice | Prison | Province | Section | Ship
| State | Territory | Town | Township | Ward ]

Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/PlacePartType.html

PLACE_RECORD :=

<place ID="XREF:PLACE_RECORD">
- <<PLACE_RECORD_PART>> {0:M}
- <original>...</original> {0:1}
- <interpreted>...</interpreted> {0:1}
- <<FORMAL>> {0:1}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</place>

Ref: http://record.gedcomx.org/gxr_Date.html

PLACE_RECORD_PART :=

<placePart ID="XREF:PLACE_RECORD_PART">
- <type><<PLACE_PART_TYPE>></type> {0:1}
- <original>...</original> {0:1}
- <interpreted>...</interpreted> {0:1}
- <<FORMAL>> {0:1}
- <<SOURCE>> {0:M}
- <<ATTRIBUTION>> {0:1}
</placePart>

Ref: http://record.gedcomx.org/gxr_PlacePart.html

RELATIONSHIP_RECORD :=

<relationship ID="XREF:RELATIONSHIP_RECORD">
- <type><<RELATIONSHIP_TYPE>></type> {1:1}
- <person1 resource="XREF:PERSONA" /> {1:1}
- <person2 resource="XREF:PERSONA" /> {1:1}
- <<FACT_RECORD>> {0:M}
- <<ATTRIBUTION>> {0:1}
</relationship>

Ref: http://record.gedcomx.org/gxr_Relationship.html

RELATIONSHIP_TYPE :=

[ Couple | ParentChild | OTHER ]

Ref: http://www.gedcomx.org/gx_relationshipType.html
Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/RelationshipType.html

SOURCE :=

<source resource="XREF:RECORD">
- <<ATTRIBUTION>> {0:1}
- <description resource="..." /> {0:1}
- <type>...</type> {0:1}
</source>

Ref: http://www.gedcomx.org/gxc_el_source.html
Ref: http://www.gedcomx.org/gxc_SourceReference.html
Note: The data type "sourcereference" seems to be the same thing as the element "source" and is being treated as the same.

Primitive types

{datetime} :=

{string} :=

Comments

louiskessler 2012-06-11T17:20:30-07:00

General Discussion

Place General Discussion about the GEDCOM X Framework here

GeneJ 2012-06-11T19:16:30-07:00

Hi Louis,

Thank you for setting this up.

I'm still reading through the materials.

Do you know if there is a "research process" outline (or a similar set of assumptions) associated with GedcomX?

The one we worked on is here:
http://bettergedcom.wikispaces.com/Research+Process%2C+Evidence+%26+GPS

I'm asking in part because I'm interested and it would probably help me understand some of the objectives. Also, though, because I assume there are particular requirements behind GedcomX that are unique to FS and/or FS-like organizations that administer these large multi-user trees.

louiskessler 2012-06-11T20:01:25-07:00

GeneJ:

Right on their home page: http://www.gedcomx.org/Home.html they state they want to support the genealogical proof standard (which I presume is through their "attribution" mechanism).

On that page, they also show off Mark Tucker's Genealogy Research Process Map, but you'll notice that they state: "Note that a model for defining research goals is not currently within the scope of the core GEDCOM X project, but it might be a good candidate for a future GEDCOM X extension."

Louis

GeneJ 2012-06-11T20:51:39-07:00

Hi Louis,

Thank you.

I have seen their reference to Mark Tucker's map, but I had forgotten the note "... not currently within the scope ..."

From FS's perspective, I suppose their administrators aren't setting focused goals. per se.

Maybe because we considered so different viewpoints in developing it, I rather like Adrian's outline. These kinds of diagrams do seem to help keep the work grounded in genealogy, too.

Other than in "general questions" is there a place questions about the assumed process should go in our outline? --GeneJ

ttwetmore 2012-06-12T04:30:52-07:00

Louis,

You've done a great job distilling out the model underneath GEDCOMX. Thanks.

Tom

AdrianB38 2012-06-12T09:18:40-07:00

Thanks are seconded from me.

Adrian

GeneJ 2012-06-12T14:17:57-07:00

This relates to the white paper supporting "Conclusion/Record Distinction," but I'm getting at the general requirements/motivations for GedcomX, so I'm posting it here.
http://bettergedcom.wikispaces.com/message/view/GEDCOM+X+Framework/54954018

Reading "Two Genealogy Models" (twomodels-whitepaper.pdf), n.d. [cApril 2012], I'm led to believe someone boils down issues to two problems.
http://record.gedcomx.org/twomodels-whitepaper.pdf

The first is the lack of "a common repository of conclusions." The second problem expressed seemed to confuse a "bibliographic citation" with a reference note, so it seemed at best a badly devised "problem." [See Evidence Explained (2007), p. 820 for "citation" (including the forms thereof) and p. 819 for "bibliography."] So that brings us back to the first problem, which is bothersome to me.

Does anyone else question that the lack of a common repository of conclusions is a problem? What is the business about "a" common repository. --GeneJ

P.S. Does anyone know who collaborated on and/or authored the document? The work does refer to we, they, etc. When was it written? Where are the materials, references, documentation that support the many premise set out in the white paper? Ha! Where's the body of evidence?

AdrianB38 2012-06-13T12:12:34-07:00

Smile from me... On the GEDCOMX GITHUB issue 165 ( https://github.com/FamilySearch/gedcomx/issues/165 ) user "jralls" comments "I wonder if, given that Conclusion and Record are now split, you should reconsider using RDF in the conclusion model. It makes a some sense in a web services context but it's awfully cumbersome in a self-contained data interchange file where external links will be too fragile."

ttwetmore 2012-06-13T16:09:59-07:00

Three cheers for Mr. Rails, "Hip hip ..."

Tom

GeneJ 2012-06-13T17:22:39-07:00

I keep coming back to the requirements.

Assume GedcomX originates as the API to access the yet to be public FamilySearch version of the online world tree. FamilySearch has said they want that to have a better reputation than its predessesor, but that is not necessarily saying too much as far as genealogical practices are concerned.

The whitepaper defines "modern technologies" as web services.

I've heard officials with FamilySearch say more than once that meeting 80% of users needs is a target. It is an outstanding question about how they define that 80% (and after all, they are the dictators).

I wonder if FamilySearch intends for GedcomX to much support desktop software. Maybe GedcomX mostly about getting your conclusion data into FamilySearch's online tree.

ttwetmore 2012-06-15T09:03:04-07:00

You can follow the issues being tracked by the GEDCOMX designers by going to their web site:

http://www.gedcomx.org/

then clicking the Community link, and from the Community page clicking the Issues link.

I have posted here a few times with opinions and questions. There's a lot of GEDCOMX's "historical thinking" available here.

louiskessler 2012-06-11T17:21:18-07:00

The Conclusion Model

Talk about the Conclusion Model here.

AdrianB38 2012-06-12T09:17:11-07:00

Not quite sure where I should post this, but re Dates - it is not explained in the GEDCOMX stuff (cue British irony - "Now there's a surprise") but dates can be expressed in (at least) 2 forms. In one manner, they can be quoted (in formal form) according to the ISO standard of yyyy-mm-dd (gross over-simplification) but they can, I believe, also be quoted (in formal form) according to the GEDCOM method (e.g. BTW mmmm and nnnn). Part of the RDF stuff that we're trying to ignore (perhaps not always sensibly) would be some sort of declaration which system was in use, and possibly(???) the dating system could change by fact because it's all labelled by fact.

(a) this actually seems a good idea
(b) clearly I failed Citation 101 because I can't find the example again.

louiskessler 2012-06-11T17:21:46-07:00

The Record Model

Talk about the Record Model here.

louiskessler 2012-06-11T17:22:23-07:00

Conclusion/Record Distinction

Talk about Conclusion/Record Distinction here.

ttwetmore 2012-06-13T08:38:37-07:00

Adrian,

The only area where contrasting the conclusion and record models causes me much concern, is the fact that the record model has 3 "high level" things, person (called persona), record and relationship; whereas the conclusion model has only two of them, person and relationship, and omits an analog of the record record, which I believe should be the multi-role event.

The record record (sorry, it was their name choice, not mine), is roughly what I have been calling an evidence event record. GEDCOMX doesn't adequately handle the concept of a role, however, since the record record has only direct pointers to persons. Therefore the type of the role and attributes of the role player (e.g., age, marriage status) don't fit well in the model. If they gave those attributes to the persona record, which does not have a pointer back to the record, all context for these facts (e.g., the date and place when and where they were valid) is blowing in the wind. The DeadEnds model has the fix for this in the notion of a person reference which carries this information.

Because the conclusion model does not have an record analog to the record record, we know that GEDCOMX has a limited view as to what an event is. You can tell from their model that they have relegated events to the status of facts that are subsumed in the bodies of person and relationship objects. Therefore the model has no corollary to a multi-rule event at the conclusion level. I don't think that is a good idea.

Tom

GeneJ 2012-06-13T08:50:52-07:00

@ Tom,

Last para ... "... multi-rule event."

Did you intend multi-role event?

AdrianB38 2012-06-13T11:54:45-07:00

Tom,
Re "persona record, which does not have a pointer back to the record". Oh. If that were an entity relationship diagram on http://record.gedcomx.org/Record-Model.html (which it isn't) and I were designing a relational (sorry!) DB from it (which I'm not), I'd have just stuck the appropriate keys in the appropriate tables, regardless of what columns had been put in the entity boxes in the ERD. So I hadn't given that a thought. But if you click through from that page to the Record Data Type ( http://record.gedcomx.org/gxr_Record.html ) , you find that the details for the personas and relationships are physically within the XML for the "record". So doesn't that suffice? No key necessary because it's physically within? Which means that persona isn't, after all, a top-level entity in the record-model doesn't it?

Nevertheless, your point that the "Record record" is missing something genealogically useful is important. Among the bits from GEDCOM 5.5's "SOURCE_RECORD" that I personally never use are tags for:
- event recorded,
- date of event recorded,
- place of event recorded,
- agency recording it

But in a decent system these would sound useful. Maybe the "facts" listed as attributes of the Record entity (substitute your own terms there) are meant to record those 4 things? Because as someone said, ANY record exists only because of an event. Even a record of a semi-permanent characteristic / attribute etc. And somehow the idea seems elegant that a Record of a wedding ceremony be mapped over to the Multi-role Event record of type "marriage ceremony". Though I'm not sure it's always quite as simple and elegant as that. A Record of a birth-registration would include facts about the birth. That should be mapped over to a Multi-role Event of type "birth". Which is an important realisation in itself because you're forced to think that you might just have a secondary source at this point.

It just seems more elegant to remove these differences.

AdrianB38 2012-06-13T12:00:05-07:00

Just realised: The UML diagram on http://record.gedcomx.org/Record-Model.html includes facts:Fact[] within the Record "entity", as it does personas and relationships - but while there is a drawn "relationship" from Record to Persona and Relationship there isn't one from Record to Fact, so surely either the line is missing to Fact or the facts:Fact[] within the Record shouldn't be there.

But I'm being logical rather than UML-wise so maybe I'm wrong.

Adrian

ttwetmore 2012-06-13T16:07:25-07:00

Adrian said,

>But if you click through from that page to the Record Data Type ( http://record.gedcomx.org/gxr_Record.html ) , you find that the details for the personas and relationships are physically within the XML for the "record". So doesn't that suffice? No key necessary because it's physically within? Which means that persona isn't, after all, a top-level entity in the record-model doesn't it?

Adrian, I completely missed that! Thanks for looking deeper and setting it straight. So the record record is a "container" object that contains its personas and relationships internally. Louis will love this because his view has always been to subsume the person information into a more general evidence record. Now it does make sense for the age info to be left in the persona sub-structure, and it also makes sense to let the role the person has in the event to be another fact associated with the person.

I'm not against the idea of having a container object for the personas, but it will still be important to be able to search among all the personas in all the records without first finding the records. Therefore there will need to be some way to identify the persona substructures that can be used from outside the record record container.

Tom

louiskessler 2012-06-17T14:10:56-07:00

Yes, Tom. That's exactly why I like what they've done.

What I see is if a repository has recorded all the information it has as Record records (I also think the double-r is unfortunate), then they'll be able to create external indexes with all the names in the personas and all the dates and places in the fact_records. Then you be able to search for all people whose names sound like xxxx who lived in yyyy between certain dates. The search will show the Records that have the closest matches.

The external indexes would solve the need you identified.

Louis

louiskessler 2012-06-17T14:12:20-07:00

Adrian,

Yes, they left off the connection in the diagram.

Their Records have facts which are: "Facts described by the record outside the scope of a persona or relationship."

Louis

ttwetmore 2012-06-17T17:41:00-07:00

Louis,

Don't know if you follow many of the GEDCOMX issues, but a number of the posters, once they realized that the record record was a container for persons, didn't like it. Whether that will impact the model is something we will have to wait on. I think anyone can post over there, so you might like to post yours ideas.

Tom

ttwetmore 2012-06-17T17:55:29-07:00

For my concept of how software should fully support the genealogical research process, the software must be able to build multi-layer trees of persons. It will be possible to do this for persons that are structures inside record records rather than (and in addition to) persons that are top level entities.

My next generation of genealogical software supports person trees in its database (which instead of being a custom database as has been the case for my previous software, is now MongoDB, which is prefect for storing and indexing genealogical information). If the GEDCOMX model does not support multi-tier persons then I will have to flatten trees to two levels on GEDCOMX export (which looses some of the historical research process), and only import two level trees. It would be unfortunate if the archival format weren't powerful enough to hold the full structure of my database, but so will be it.

To get what is really needed to support these ideas, necessary for full support of research, GEDCOMX will have to either merge their conclusion and record models or push the multi-tier concept into the conclusion model, which would be quite easy to do. I wonder if they have the vision.

louiskessler 2012-06-17T18:01:57-07:00

Tom,

I've been fairly active at GEDCOM X and we've posted on the same issues there a few times. I hadn't seen the Record record discussions though.

To me, their "persona" is definitely not nothing like your "persona". Theirs is just a placeholder for a person in the Record record.

Louis

louiskessler 2012-06-17T21:43:28-07:00

Tom,

Don't just speak about your concepts. Do it! Get your program going and prove to everyone it's the right way.

You probably have a good chance of finishing your program before a new GEDCOM standard is ever finalized. And if your program proves to people that your concept is important, then it WILL influence the standard. Until then, out of hundreds of genealogy programs, none use your idea of multi-layer trees of persons. It is illogical to build into a standard something that no program needs, because all other programs would have to support it if it is in the standard - even if they don't use it.

Louis

ttwetmore 2012-06-18T06:18:26-07:00

Louis,

You're a great guy! Thanks. I am currently diverted in a fascinating area. It is a project that reads natural language genealogical text (e.g., obituaries) and extracts person records with events and relationships. Imagine pasting text from an on-line newspaper into a program and getting back a [B]GEDCOM[X] file of the persons mentioned!

AdrianB38 2012-06-12T08:14:40-07:00

I think it would help if we understood what "two models" actually means. Does (e.g.) it mean a bit of software would only operate in/on one model?

It would also help if we could see some actual issues that having one model results in. I see a list of dichotomies, most of which simply make me say, "Yes, so what?" E.g. Fundamental Unit in one is the Record, Fundamental Unit in the other is the Person. What's wrong with having two fundamental units? Yes, the goals of the processes using the data are different. Again, so what? Process does not equal data.

The linked document states "Combining these domains would make for a smaller, more generic model which would confuse the “person existed” assertion with the “record existence” documentation." Wouldn't one be labelled "person" and the other "record"? And, to take an example, if Name is defined differently in the two models, then the possibility of incompatibility arises. Bad.

For me, I feel that a legitimate issue - different viewpoints - has been built up into something huge.

BUT I also worry that we _might_ be making too much of this ourselves. Is there is a danger that we're going round thinking the distinction has genealogical significance when it doesn't? If the coders want to hack the "real-world" stuff about and parcel it up behind the scenes, fine, but don't bother me about it. Might this be the case? I dunno. I can't understand enough of the implications yet.

GeneJ 2012-06-12T08:31:15-07:00

Ha! I'm writing out a comment on the same thing, Adrian.

Will post shortly.

ttwetmore 2012-06-12T09:01:48-07:00

Adrian,

The conclusion model only holds conclusions, which are persons and relationships, supported by facts (that are ALSO conclusions), that include dates and places. That's all.

The source model holds source metadata. The conclusion model allows its objects to refer to objects in the source model via source references. With these two models working together you reach GEDCOM equivalence. Any software that deals with genealogical data like today's systems do, would implement the conclusion and source metadata models.

Neither the conclusion model nor the source model supports transcribed evidence records. That is the purpose of the record model, which is not part of the first release of GEDCOMX. Think of the record model as what you would need to support data extraction projects like the 1940 census indexing project.

I believe that any software that supports the research process must support the extraction of evidence from sources and its storage in evidence records. For me the ability to use and manipulate evidence records is key. And this is exactly what the GEDCOMX record model is for -- its records are the evidence records, including my hobby horse, the persona record.

So I would answer that any software system designed to support the full research process needs all three models.

Unfortunately, GEDCOMX gives no hints on how the similarly named records from the conclusion and record models would actually relate in a system that used them both. I have had that worked out in the DeadEnds model for years.

I believe that it would be much simpler and effective for the conclusion and record models to be combined into one, and let the person, etc, records do double duty. I argued that strongly when I was asked to comment on GEDCOMX many months ago, but the idea of the two models was so deeply ingrained by them, that I didn't have a chance. I don't like the strict black and white view of the world that the two models enforce. By combining the models one reaches the much more realistic, and often very gray, world in which genealogical data actually lives. Combining the models importantly allows the multi-layer person trees that I believe are key to managing a complex genealogical research project. (Disclaimer: Louis believes I am completely off my rocker (!!) about this.)

Hope that helps a little, while also allowing me to rock back and forth on my horse.

Tom

AdrianB38 2012-06-12T09:23:03-07:00

"GEDCOMX gives no hints on how the similarly named records from the conclusion and record models would actually relate in a system that used them both"
That's where I can't see it clealry either. Again, it depends on whether the 2 models carry through into the software - if they do, and there's a clash, what then?

Maybe it's as simple as "RECORD.NAME" and "CONCLUSION.NAME" - but why can't it be written down?

ttwetmore 2012-06-12T11:00:38-07:00

Adrian,

With respect to GEDCOMX the software's job is to import and export files in its format. If the GEDCOMX file contained both conclusion and record model data, the software would know the difference because the models would hold different formats for the two types of records.

Presumably the software could use the same data classes for the two GEDCOMX person entities, but the software has to at least keep a tag so it can keep them segregated. I would use the same classes or at least base classes. In reality I already have the classes that I use for genealogical entities, and they are flexible enough to import and export GEDCOMX formatted data.

I don't know what you mean by a clash. Presumably the software would always know which objects belong to which model, and would operate accordingly.

Tom

AdrianB38 2012-06-12T12:02:43-07:00

Tom - when I referred to "clash", I meant something along the lines of (say) DATE in the record and conclusion model being defined differently - e.g. the first allowing textual stuff like "last summer" but the second demanding it be in ISO / numeric format. With the result that if someone wrote some logic to auto format the ISO / numeric date and it turned out to contain "last summer" instead, things might go awry. If indeed, "the software would always know which objects belong to which model, and would operate accordingly" then that's fine by me - it's simply that I don't know this and so am going into maximum suspicion mode when presented with two models.

Adrian

ttwetmore 2012-06-12T18:25:40-07:00

My view on dates are unconventional. I don't believe there should be any standards for dates in genealogical databases. I am anti the trend towards standardization in most genealogical contexts, including personal names and places.

My LifeLines program has been in use by Unix-based genealogists for over twenty years. LifeLines users enter dates in any format they like, and the LifeLines date parser figures them out. When LifeLines generates display screens or writes reports users can choose their preferred output formats for dates, and LifeLines will put them in that format.

My current big genealogical project is a natural language processor that reads obituaries and other notices in natural English and extracts genealogical information from the text. This project recognizes dates in essentially every format that they appear in English text. Such things as " He will be buried next Thursday..." or "In your notice of the 3rd inst. ..." or "in January last year" have to be understood and resolved. It is surprisingly easy to understand what a date means in almost any format someone wishes to express it in. After all that's we express them in those ways so others can understand them!

My opinions are so far off the mainstream tracks that I harp on them for no particular reason. To try to answer Adrian's question, I would expect that GEDCOMX must choose the same date standard for both models.

Tom

AdrianB38 2012-06-13T02:53:04-07:00

Tom said "I would expect that GEDCOMX must choose the same date standard for both models."

I would also expect that to be the default choice. My concerns are really that I still don't fully understand the implications of having two models - if it's only affecting the documentation, not the software or genealogy, I can't be bothered about it. However, I suspect it's a bit more than that - I suspect decisions have to be made manually that have implications on the software and I worry about the risks of mismatching decisions.

I believe, Tom, that both you and I think the idea of two models is bad as it increases risk - I have absolutely no feel for how that risk works out in reality, you seem more relaxed that getting round the risk is do-able.

Adrian
PS - the date issue is separate - I only chose it here as one example I could think of where issues between two forms might creep in.

louiskessler 2012-06-11T17:23:45-07:00

Relationship record

RELATIONSHIP :=
<relationship ID="XREF:RELATIONSHIP"><type><<RELATIONSHIP_TYPE>></type> {1:1}
<person1 resource="XREF:PERSON" /> {1:1}
<person2 resource="XREF:PERSON" /> {1:1}
<<FACT>> {0:M}
<<SOURCE>> {0:M}
<<ATTRIBUTION>> {0:1}

</relationship>

Ref: http://www.gedcomx.org/gxc_el_relationship.html
Ref: http://www.gedcomx.org/gxc_Relationship.html
Ref: http://www.gedcomx.org/java/apidocs/org/gedcomx/types/RelationshipType.html#values()

louiskessler 2012-06-11T17:24:22-07:00

Record record

RECORD :=
<record ID="XREF:RECORD"><<PERSISTENT_ID>> {0:1}
<<ALTERNATE_ID>> {0:M}
<source-record resource="XREF:RECORD" /> {0:M}
<type><<RECORD_TYPE>></type> {0:1}
<<PERSONA>> {0:M}
<<RELATIONSHIP_RECORD>> {0:M}
<<FACT_RECORD>> {0:M}
<<ATTRIBUTION>> {0:1}

</record>

Ref: http://record.gedcomx.org/gxr_el_record.html
Ref: http://record.gedcomx.org/gxr_Record.html
Note: Not sure why there would be attribution here. Attribution is only for conclusions.

RECORD_TYPE :=
[ Bank | Birth | Census | Death | Draft | Land | Legal | Marriage | Migration
| Military | Pension | Probate | Roll | Tax | Vital | OTHER ]

http://www.gedcomx.org/java/apidocs/org/gedcomx/types/RecordType.html

AdrianB38 2012-06-12T08:34:22-07:00

Two pet ideas of mine:
1. Name - looking at the UML diagram of the Record Model, I see the possibility that a Persona has multiple names. Good. If it's a document recording someone's change of name, then we want 2 names (at least).

But where's the ability to date them?

To give a concrete example - the Petition for Naturalization for my GG Aunt gives 3 names. She's A B BRUCE also known as B BRUCE. And she entered the USA under the name A BRUCE on 26 June 1909. And she wants her name to be changed formally to B BRUCE. Now there are several facts there but no way of linking the Immigration fact (whatever you want to call it) with the name, not even by implication of the two dates being the same, because the Name isn't dated.

AdrianB38 2012-06-12T08:48:15-07:00

Idea 2: Relationships (or something similar) need more than 2 personas.

Suppose I have a list of the 6 (say) partners in the BG import-export partnership of San Francisco. How do I record that in this Record Model?

I would clearly have a list of 6 personas, each with 1 name. Now I could give each a dated occupation fact - the occupation would be "partner", the agency (presupposing that's agreed and added) would be "BG of San Francisco" or whatever the written name is. That's how I'd do it now. But this misses an essential element of partnership and treats it just like another job. That element is that the 6 of them are, in various ways, mutually reliant on each other to stump up cash and to receive profits. I'd like therefore to create a single relationship that includes the 6 of them and has a type of Business-Partnership, or something like that.

This is less important than dates for names because it can currently be done by occupation and by linking notes all over the place, but that then loses structure and requires the human to read all those notes.

louiskessler 2012-06-11T17:25:35-07:00

Person record

PERSON :=
<person ID="XREF:PERSON"><<IDENTIFIER>> {0:M}
<<LIVING>> {0:1}
<<GENDER>> {0:1}
<<NAME>> {0:M}
<<FACT>> {0:M}
<<SOURCE>> {0:M}
<<NOTE>> {0:M}
<<ATTRIBUTION>> {0:1}

</person>

Ref: http://www.gedcomx.org/gxc_el_person.html
Ref: http://www.gedcomx.org/gxc_Person.html

louiskessler 2012-06-18T18:37:39-07:00

The Purpose of GEDCOM X

I've added to the GEDCOM X Framework, the Purpose of GEDCOM X, which Ryan Heaton posted at the GEDCOM X site. It is:

The purpose of GEDCOM X has been stated as:
To define an open data model and an open serialization format for exchanging the components of the genealogical proof standard.

When they talk about "the components of the genealogical proof standard," they mean these:
•Search Reliable Sources
•Cite Each Source
•Analyze Sources, Information, and Evidence
•Resolve Conflicts
•Make a Soundly-Reasoned Conclusion\\

See: https://github.com/FamilySearch/gedcomx/issues/156#issuecomment-4848666

My question to everybody is: How well does this purpose fit in with the goals of BetterGEDCOM and FHISO?

louiskessler 2012-08-15T14:00:28-07:00

Thank you Ryan, for expressing your views. I personally appreciate what you are doing and where you're trying to take it. Your post you just published at GEDCOM X: Whence FHISO? http://familysearch.github.com/gedcomx/2012/08/15/whence-fhiso.html makes your views very clear.

If you were the person with the authority behind GEDCOM X, then I'd feel very good indeed. But you've got a big gorilla behind you, who needs to come out and be more cooperative with the FHISO people, because ultimately, all parties are after the same thing - the ability to exchange data.

The fact that FamilySearch has not come to an agreement with FHISO, when it was FamilySearch who initiated the push to get FHISO formed, is in my opinion as an interested developer, dragging everyone's feet in the ground, causing much friction, and needs to be resolved as soon as possible. Only then, will both parties be able to work cooperatively, and FHISO will be able to spend their time and resources to aid the initiative.

Louis

louiskessler 2012-08-15T14:12:22-07:00

Also see this new post by Tamura Jones:
http://www.tamurajones.net/GEDCOMXFamilySearchFirst.xhtml

ttwetmore 2012-08-16T07:21:04-07:00

Ryan,

Thanks for the correction -- you are doing the Conclusion model first.

Quick followon then -- since you are doing the CM first I assume its main application will be the on-line pedigree projects, the model to be behind the new NFS.

NFS has a revolutionary feature, IMHO, even if it were not expressed in that way. That was the ability to join together person records from different sources into groups that users could manage, bringing together records that the user believes to be the same person, or splitting groups that the user believes containes more than one real person. Because groups can be joined, merged, broken apart, etc, the individual records must maintain their integrity -- they can never be merged with other records, or merged into a higher level type concept object (meaning merging in a destructive way, as in giving up your essence to be part of something bigger).

This is one form of the Persona/Person record dichotomy that I have been espousing for 20 years. At least in the limited application of on-line pedigrees, I felt that NFS really got it right by using using this dual-level concept. Of course the practical issues came to the fore, 1st being the HORRIBLE quality of the vast number of records that were used to seed the NFS, and 2nd the merge/split/re-merge/re-split wars that can occur when different users disagree on interpretations of the records. However, I believe that NFS now allows each user to maintain their own groupings of the "Personae."

All this is leading up to the point, that in the CM I don't see the facilities that could be used to support the dual level nature of the NFS tree. And since the dual nature of the NFS is, even with its problems is, IMHO, the most important concept behind it, I am a concerned that the CM would support the feature.

Contrast the NFS with WikiTree. I joined WikiTree last night and tried to upload a GEDCOM file. The file had more persons in it than the general WikiTree guidelines (though not all that many, a few % points too many), and they would not upload it. The reason is that WikiTree wants a FULLY MERGED tree -- if two or more people add the same persons, they are responsible for immediately merging their data into a single record. If someone adds a large GEDCOM they are overly concerned with the potential problem of introducing duplicates. You can certainly understand their concerns, but do they really have the right solution? If I introduce duplicates into a tree that has some fundamental disagreements with my interpretation of the facts, there are some major ramifications. I either rearrange people and royally p*ss someone else off, or I merge my data into a structure that I fundamentally disagree with.

Is the CM going to allow the dual nature of person records required to support NFS?

ACProctor 2012-08-16T07:35:22-07:00

@Tom, after our recent differences on the family-units in a separate thread, I did make a note that a generalised SET/GROUP feature could be used to model both family-units (however subjective) and personae. As well as not mandating either concept, it would be open-ended to accommodate other types of group, including work units.

If this were done as a non-merging group (e.g. collected links rather than actual merged records) then the same persons could be present in multiple groups at the same time.

I know this is OT, and only marginally relevant to this thread, but this seemed to good opportunity to mention that I want to investigate it further.

Tony

heatonra 2012-08-16T08:05:57-07:00

@ttwetmore:

Is the CM going to allow the dual nature of person records required to support NFS?

I hope so, yes. The reason for my equivocation is I'm not sure I completely understand your definition of the feature. But I hope you'll we willing to help us get it right.

heatonra 2012-08-16T09:02:35-07:00

@louiskessler

Also see this new post by Tamura Jones:

Tamura's always a fun read, isn't he? I feel bad for him sometimes, though. He likes to put on a blindfold and throw grenades, which gets him a lot of attention (and even some elation from people who have personal grievances that get hit by the shrapnel), but it doesn't make him very many friends.

Anyway, most of his post is just rantings about his prejudices and false assumptions about the project that don't merit comment.

But I think it's probably worth pointing out how ridiculous it is for him to imply that GEDCOM X isn't/won't consider the needs and feedback from the community. I think anybody who has actually participated in the project can see the sincere effort we're making to incorporate the suggestions and feedback we're getting. For example, here's a comment from community member jralls who is surprised at how many changes we're willing to take on as a result of the feedback and comments on the source description model. Tamura himself has made a big impact on the direction of the file format and serialization format by his own post that mocks the initial draft.

His implications that the project isn't really "open" are pretty silly, too. Every little change we make is fully visible and open for comment from the public. The word "proprietary" might be debatable, as is whether the project is really a "community" project, but I think it's weak to imply that the project isn't open.

I'd make those comments on his blog, too, if it wasn't "proprietary" and was "open" for comments by the "community". :-)

I'd really like to meet him personally some day, to get to know him and get familiar with what drives him and motivates him. I'd love to have some more insight as to why he's got the prejudices that he's got and what his personal hopes are for the genealogical industry as a whole. My impression is that he's got a wall there that is hard to penetrate.

louiskessler 2012-08-16T09:11:05-07:00

Personally, I think every person should have their own data kept separate, and they should be allowed to "virtually merge" other people's to their data the way they see fit.

To me, the concept of a world tree (or Borg Tree as TJ likes to call it) is impossible and even idiotic to attempt, because everyone has their own subjective concept of what is correct and what is wrong.

And any attempt to make that world tree will ALWAYS result in merge/split/re-merge/re-split wars. NFS is attempting to prevent that by requiring source documentation, and making the m/s/rm/rs process more onerous. But there are contradictory sources, sources that are plain wrong, and people who are adamant about what they believe despite overwhelming evidence against them. You can't stop this. World trees will never be able to contain the "truth". All it will contain is are the conclusions of the people most willing to slog through the trenches.

Now add the complication of having not one representation for a conclusion person, but having many - dozens, hundreds, even thousands of possible people representing my grandfather, who I have to now slog through to try to see who I believe are the correct ones. Forget that, I say. What I want is to be simply given the source data that I can look through directly for names and places that might have info on my grandfather and his family. If I find sources that I think pertain to my grandfather, I will add them to my ONE conclusion person, and I will document those sources properly.

That will be in my personal data file. Someone should be able to find MY conclusions in NFS and be able to "virtually" merge them into their
own if they believe we have the same people. Maybe I can be notified of this and we share information personally between each other like true genealogists should - and we can both improve the quality of our separate conclusion datasets. The virtual merge will then be easier next time, because there will be fewer conflicts after the collaboration.

I was at RootsTech 2012. I heard Jay Verkler talk about the NFS concepts, and I heard Ron Tanner go into detail about the merging and conflict resolution process in all its gory details. I completely disagree with merging the actual conclusions. I think everything each researcher does should remain as a separate entity, and virtual merging should be the method of combining other peoples information into ones own.

Louis

louiskessler 2012-08-16T09:23:01-07:00

Ryan:

Re Tamura, all I can suggest is: Make love, not war.

Louis

ttwetmore 2012-08-16T12:01:20-07:00

Ryan,

I believe the NFS approach to person records and conclusions is as good as it is possible to be for a cooperative on-line tree. Having good records for many persons, and often many records for the same person from different sources (what I call Personae) and then bags to put them in (what I call Persons) to represent your conclusions is the best one can hope for. The NFS, as faulty as it may be, bases its infrastructure on this duality of concepts, and in my opinion it is the only infrastructure than can lead to success.

Then allowing each user to define the bags in their own way, but to easily access and see how other have defined their bags, make cooperative work as as easy as I believe it can be. There are no split/merge/re-split/re-merge wars if everyone can have their own bags. You don't end up the single, Star Trekkian family tree of all human kind, but you end of with an easily traversable network of the overall collective wisdom about the entire family tree.

In other words, as I interpret things, the NFS is a "classic" Persona/Person system. My main concern with the Conclusion model is that it support this duality in the person concept. I personally believe that the Persona concept is the key concept necessary to enable the next generation of genealogical software, so I'm always looking at models to see if they encompass the concept.

As you know I go much further than this is my own thinking, believing that a full, unrestricted, tree of "Person bags" be embraced. I have used this concept in a project that has successfully recognized 100,000s of Persons from an input set of billions of "Personae," so I probably have more experience with the efficacy of the idea most. And all that is needed to add this power to any model is to allow Person/Persona objects to have references to other sub-Person/Persona objects.

heatonra 2012-08-18T10:54:08-07:00

Thanks for everybody's comments. It's enriching to hear everybody's philosophies.

"all I can suggest is: Make love, not war."

Excellent suggestion. Like I said, I'd be eager for the opportunity to get to know him personally. Maybe I'll send him a personal e-mail or something.

louiskessler 2012-08-18T14:37:23-07:00

Ryan,

Then come to the BetterGEDCOM Developer's Meeting on Monday from 12 noon to 1 p.m. CDT. You're MDT aren't you, so it would be 11 a.m. to 12 noon for you.

I should be there this week and Tamura is almost always there.

The GoToMeeting link is at:
http://bettergedcom.wikispaces.com/Developers+Meeting

The BetterGEDCOM meetings are somewhat informal, with GeneJ coordinating.

We'd love to have you.

Louis

GeneJ 2012-08-18T19:19:48-07:00

Randy Seaver's blog featured an article yesterday, "What are FamilySearch's Intentions for GEDCOM X?"

http://www.geneamusings.com/2012/08/what-are-familysearchs-intentions-for.html

Have posted a response, more or less as below:

Thank you +Randy Seaver and +Russ Worthington . A few thoughts below, mostly following those I posted on your blog.

I hope folks who read "Whence FHISO?" will then read, or read again, "Why FHISO?"

http://fhiso.org/2012/07/why-fhiso/

I'm among the international group of volunteers working on FHISO. This independent group is actively communicating with vendors and genealogical organizations all over the world about FHISO. From time to time, we are asked to explain the difference between FHISO and FamilySearch's effort on GEDCOMX. To that end, there were a couple of things in Ryan Heaton's posts this week that are helpful.

(1) Ryan wrote, "[W]e need to figure out how to best work together to get the work done." [a]

Ryan's talking to all of us. The way forward for our diverse community IS along the path of multi-stakeholder governance already followed by most business sectors. It is a brightly lit path--see the Why FHISO? document (link above).

(2) "[All] of the requirements for the [GEDCOMX] project can be summarized into a single statement: GEDCOM X must be able to accommodate FamilySearch's Platform API."[b]

While that might be disappointing, was it ever reasonable to assume the proprietary development effort of one vendor, in this case, FamilySearch, would have fairly considered everyone's needs/every competitive interest?

From Why FHISO? (section three), "The Family History Information Standards Organisation (http://fhiso.org/) was created to develop international standards based on the principles of diversity and due process. Standards developed by the organization will better meet the different and competitive needs of all service providers, program developers and users--globally."

Equally important, from the same section, "Developers will be able to adopt a single [FHISO] standard with the confidence that their product meets expressed community requirements." In FHISO's case, requirements are documented from the get-go. "Identify needs" is the first step in the standards development cycle.

Elaborating on Russ' comment to your article, it is time. Some of us have been waiting at least 16 years. Let's do it. fhiso@fhiso.org --GeneJ

[a] "Whence FHISO?"
http://familysearch.github.com/gedcomx//2012/08/15/whence-fhiso.html
[b] "GEDCOMX Requirements"
http://familysearch.github.com/gedcomx/2012/08/14/requirements.html
Collapse this comment

AdrianB38 2012-06-19T04:38:29-07:00

My understanding of BG's goals is this:
- BetterGEDCOM will be a file format for the exchange and long-term storage of genealogical data.
- It will be more comprehensive than existing formats and so become the format of choice.

(See http://bettergedcom.wikispaces.com/Introduction+to+Goal+and+Requirements )

This is clearly a higher level pair of goals than Ryan's. My issue with the goals that Louis has copied, is that if you need to explain it, with cross references, it ain't working as a goal. As Tom said on the cross referenced page, "If you can't say it in a single sentence, I'm not all that interested. Sorry." (That may not be the exact context, but it's a sentiment I concur with as far as a _statement_ of goals goes. Except my quote of BG's goals needs 2 sentences. Sorry. <grin>)

OK - so, having established that I'm not keen on the language, does that actually matter? Is the GEDCOMX goal actually OK even if it's not concise? I'm not keen on it for the following reasons:

- "To define an open data model". I like "open" and I guess it ought to have been in our goals. "Data model" I don't like. A data model is simply a means to an end, a way of communicating between IT geeks. After all, GEDCOM doesn't have an officially agreed data model and that's worked (sort of) for years.

- "an open serialization format". Again, "open" is good. But why "serialization format"? That is geek-speak. What's worse is that it's geek-speak that only a sub-class of IT geeks understand. To most of us, it's a "file format". I see absolutely no reason why I can't talk about the file formats of lumps of data inside a zipped file, even though having files within files is illogical from a strict viewpoint. Most of us IT geeks understand recursion and virtual concepts, we can cope with files inside files. I'd prefer this to be stated as "an open file format" for general consumption, and the technical documents can say, "Of course, we mean serialization format, here." Nevertheless, this point is perhaps more about me having a moan than anything serious - once you've tweaked the words, it becomes like our "file format", so it's a minor point.

Now, what's the format for?
BetterGEDCOM is "for the exchange and long-term storage of genealogical data";
GEDCOMX is "for exchanging the components of the genealogical proof standard".

I have an issue with the omission of "long-term storage" from GEDCOMX's goals. If you omit it, and you have omitted the term "file", then you have absolutely no aim to produce a chunk of data that can be saved off for future use. In other words, press the <enter> button, it's there, it's transmitted to the mother ship, it's gone. Where's my copy to keep? Where's the copy in case the mother ship crashes and burns because someone's bought it out who doesn't like the finances? All that is why I want a copy to keep and the term "long-term storage" is designed to give me that copy _and_ to ensure that copy is readable in the future by someone with different software.

Carrying on with, "what's the format for?", BG has "genealogical data"
GEDCOMX has "components of the genealogical proof standard". Again, I don't like this - the GPS is about how you do genealogy if you're a serious student of it. In the USA. And a few other places. I'd been working on my family history for a couple of years before I heard of the GPS. Does that mean GEDCOMX would have had nothing to offer me? This is a genuine concern - there's a world of difference between "genealogical data _including_ components of the genealogical proof standard" and "components of the genealogical proof standard" because the first allows the possibility that the user might actually not be working to the GPS.

The worlds potentially collide with how (to use sort-of-GEDCOM terminology) to link "facts" to the "sources". In GEDCOM there's the concept of a pointer which gets called "citation" (Yes, I know, please don't argue with that as a name, I'm just saying what GEDCOM type folks often call it) and this "citation" says which source contains the data, where in the source, whether it's primary, secondary for that fact, etc, etc.

If you want to work to the GPS you _MIGHT_ say that this link actually contains a proof (proof statement, proof argument, whatever) and from _there_ we point to the relevant source records. Aside from the fact that the source records relevant to a proof might include many more than just the one for the fact, just what do you do if you're like 99.9% of the GEDCOM files in existence and don't have a formally recognised proof to put in that potential thing that GEDCOMX might require at this point?

So, yes, I absolutely would want both BG and GEDCOMX to _include_ the GPS components but the goal has to be aimed at genealogical data as a whole first.

louiskessler 2012-06-19T16:31:59-07:00

Adrian,

That's an excellent explanation!

Louis

GeneJ 2012-06-24T12:24:16-07:00

(1) Information about the Genealogical Proof Standard (GPS)

From the _The BCG Genealogical Standards Manual_, 2000 (herein, BCG Manual), p. 1 ("The Genealogical Proof Standard") wherein, "The ultimate goal for all genealogists is to assemble (and perhaps share with others) a reconstructed family history that is as close to the truth as possible."

The text continues, saying that "to achieve that goal, we adhere to an overall standard by which we measure the credibility of the statements we make about ancestral identities, relationships, life events, and biographical details. This credibility standard is called the Genealogical Proof Standard. … "

(2) Perhaps some clarification/context.

(a) The "Genealogical Proof Standard" (GPS) is published by the Board for Certification of Genealogists (BCG).
http://www.bcgcertification.org/resources/standard.html

I know Ryan and others at FamilySearch have put a lot of hard work into the goals of GedcomX, but when referring to the standard, it might have been better to link to its presentation on the BCG site rather than risk interpretation of it in a summary. For example, from my perspective, there is a difference between "components of" and "elements of" (the published GPS is made up of "elements"). Ditto, "Search Reliable Sources" is not an element of the GPS. The element is, "Reasonably exhaustive search" … And the element is not "Cite each Source," it is "Complete and accurate citation of sources…."

(b) While presented as an "overall goal," the GPS itself is _one_ of a group of BCG published "Research Standards." [_BCG Manual_] In addition to the GPS, there are 18 "Data Collection" standards; 16 "Evidence-Evaluation" standards; and 22 "Compilation" standards. [Observation of _BCG Manual_]

(c) Most of us probably think of the BCG as a US organization, and fairly so as it has probably certified more individuals there than in other places. The BCG does, however, conduct certification world-wide. See their site for the page, "Find a Genealogist." There are eighteen different countries listed in the drop down box, "country."

From my outsider perspective, GedcomX is first and foremost an API supporting FamilySearch's proprietary effort on its new tree … and I suspect the requirements behind GedcomX are probably driven mostly by that same proprietary effort. What is said about the tree (and GedcomX) probably might tell us more about how the church wants to market the tree, as opposed to the problems GedcomX is will solve when introduced.

I'd like to see a real community standard developed, ala, FHISO. As part of FHISO, I'm among those who hope there will be _a_ FHISO project team in which those who practice the GPS engaged with technologists and the community to determine an initial set of related data requirements supportive of the GPS. I believe we would learn from the open discussions and benefit from the results.

Alex-Anders 2012-06-24T14:52:22-07:00

From my outsider perspective, GedcomX is first and foremost an API supporting FamilySearch's proprietary effort on its new tree

This is how I see GECOM-X.

But I also see no other viable alternative at present.

GEDCOM 7 from Louis Kessler is my next looked into option.

bamcphee 2012-06-24T21:29:07-07:00

The purpose of GEDCOM X has been stated as:

To define an open data model and an open serialization format for exchanging the components of the genealogical proof standard.

My question: With whom, what will this exchange be? Is it FS to softawre or will it be as individuals hope, between software or individuals? Will FS need to be in the loop for this to work?

Alex said: GedcomX is first and foremost an API supporting FamilySearch's proprietary effort on its new tree.

Is this true or not???

ttwetmore 2012-06-25T14:38:56-07:00

GEDCOMX is NOT an API. It is a file format based on a data model. For some gobbledegooky reason, file formats are now called serialization formats. The only reason to say serialization rather than file, is that a GEDCOMX transmission might simply squirt through a wire or a cloud from one application to another without ever manifesting itself as a file along the way. It's still a file format.

The files are exchanged between programs, possibly desktop genealogy applications, possibly on-line family tree archiving programs, possibly on-line services that provide records in response to your search requests. Any software that might want to provide, consume, store or exchange genealogical data are the "with whoms" you are asking about. Even your lowly text editor is one of the "with whoms" if you choose to look at a GEDCOMX file with it.

FamilySearch systems will be able to accept data in GEDCOMX format, and they will be able to provide data in GEDCOMX format. Imagine going to the FamilySearch site and searching for information. If good results are found you'll be able to push a button and get the records sent to you in GEDCOMX format. Other programs that decide to support the GEDCOMX format will be able to import and export data in GEDCOMX format. The systems that support GEDCOMX are simply the systems that choose to. Nothing more nothing less.

The objects stored in GEDCOMX files, so therefore defined by the model, are objects of genealogical interest. For some other gobbledegooky reason, these objects were called "components of the genealogical proof standard" in the sentence quoted above. You can chalk that up entirely to the author of that sentence wanting to sound politically correct. Components of the genealogical proof standard simply means persons, dates, places, sources, events, evidence, records, relationships, conclusions, etc, that can be organized logically in such a way to as to provide good justification as to why you believe the persons in your database are who you say they are. The model provides the specifications for all these types of objects and how they must relate to one another.

louiskessler 2012-08-14T17:09:07-07:00

Revision!

GEDCOM X has just ONE requirement:

"GEDCOM X must be able to accommodate FamilySearch's Platform API."

See: http://familysearch.github.com/gedcomx/2012/08/14/requirements.html

Louis

louiskessler 2012-08-14T17:36:37-07:00

... So the purpose is to define an Open Data Model

... and the requirement implies that this Data Model only need serve FamilySearch.

Louis

ttwetmore 2012-08-14T21:29:41-07:00

Louis,

Thanks for posting the link. That requirement is not surprising, though a little disconcerting. Ryan softens the blow with the rest of the note.

I would like to know whether FS is still breaking GEDCOM-X down into two separate Record and Conclusion models, or whether they might try to integrate them before the first release. Both must refer to sources so I would expect that if they keep 'em separate they might split out the "Source Metadata" model as well. Three models.

The last I read they were coming out with the Record model first. In that model the Person objects are really the same concept as the DeadEnds Persona object, so I'm pleased about that. But the entire rest of the genealogical software industry needs the Conclusion model, since that is the type of data they deal with the vast majority of the time. Without a Conclusion model it makes almost no sense for any non-FS organization to convert to GEDCOM-X (other than as a way to import purely evidence data from a data service provider).

Or maybe they are coming out with the Conclusion model first, though I don't think so. I think their first order of business is establishing the model they need to hold the massive amounts of data they are now "computerizing" from their microfilm collection -- all record related stuff.

And of course I am most interested in how FS intends to make the two models fit together. I fear they might keep the two model independent of each other, but I don't think they are obtuse enough to do that. Which means there will have to be some way to relate the Person objects in the Conclusion model with the Person (Persona) objects of the Record model. Will that be a two-level relationship (required) or the multi-layer relationship (best)? Hope they decide my way is the way. I harp on that issue a lot, but the GEDCOM-X folk have yet to make a firm statement about any of this double model stuff.

I really hope they will steer clear of the all the RDF/URL/semantic stuff, as it will turn off anyone who has to deal with the archive format. Will it just remain an anecdote or will it rear its ugly head once again? I've pointed out to them that any semantic guru wonk would be able to convert from a simple archive format into an RDF format by a simple translation process, as long as the tags used are mapped to the proper semantic domains in the model specifications. It can ALL be handled by good specifications. I think they went the RDF/URI direction to prove that they knew how to be modern and politically correct, and look really smart, and maybe, just maybe, they really thought it was a good idea, but I hope and pray that common sense is now prevailing.

Tom, CBW, DeadEnds Software

louiskessler 2012-08-14T22:19:39-07:00

Tom,

If nothing else, Ryan seems to have been trying to digest all the comments and criticisms (whether constructive or not) that have come his way.

He's found that he can't please everyone. And he's realized that the only one he needs to please is Family Search.

In order to start making progress again, he/they are going to have to start making some decisions.

So I'm sure that this new "requirement" will now become the basis on which the decisions are made.

Not judging. Just saying.

Louis

ACProctor 2012-08-15T02:14:50-07:00

Ryan has been very honest in that post, and I applaud FS for supporting it. GEDCOM-X always has been a proprietary venture, and has very specific requirements within FS. There's nothing wrong with that, of course, as long as its requirements and goals are stated clearly and honestly.

One potential advantage of this positioning is that it would allow FS to participate more freely with true community projects without having to defend their internal deadlines and requirements. Any community project is bound to take longer, and must accommodate the requirements from multiple stakeholders, so it's easy to see why that would have been hard to marry with the GEDCOM-X project.

Tony

heatonra 2012-08-15T09:45:52-07:00

Hi guys.

I was just lurking a bit. Thanks for your comments.

@louiskessler, this requirement isn't new, it's always been there. It just was never said out loud until now. So when you say that you're sure this "new" requirement will be the basis on which other decisions are made, I just don't see it that way. To use a sports analogy, every game has some boundary lines.

For the game of GEDCOM X, that's one of the boundaries. It's not as restrictive as you think. It's still a pretty cool game.

@ttwetmore The conclusion model is our primary focus right now and it will be established first. We're making significant progress in accommodating the record-based research that you need in the conclusion model. Still a lot of work to do, but we'll get there eventually. See issues 144 and 182 and the (still open) issue 202.

Also, when I say we're getting rid of the RDF noise in the serialization model, I mean we're really getting rid of it. You should see some pull requests coming through soon that will do that. If it ever comes back, it will be in the form of another distinct and separate serialization format as you describe.