What are the elements of a BetterGEDCOM data model, including their precise definitions, uses and scope? What are the relationships to these elements?

What IS And What IS NOT Core To The Initial BG Project Replacing GEDCOM?

Are there elements or aspects that can be codified as part of the concept of a genealogical research workspace that need to be identified and considered part of the next phase of work?

Person entity

Evidence and Source class of entities
Source Entity

Citation Entity
A citation entity in a computer database is the relation between a source and a piece of evidence or genealogical fact as well as notations such as footnotes, endnotes and the like.

Location entity

Use Of Templates

Templates can be used to afford both users and developers maximum flexibility in using the BetterGEDCOM standard's components. This area discussed how templates can be employed in various parts of BetterGEDCOM.

Family, Group and Relationship entities
Family, Group and Relationship entities are similar concepts with often very different meanings

Event entity

Events are usually thought of as an association between:
1) one or more people
2) at a date/time or over a period of time
3) at a given location or locations

Property, Attribute, Characteristic or Fact (PAC) Elements

A property, attribute, characteristic and fact are all of the same concept. They ascribe some value of some property to some entity. PACs exist only in so far as they describe some aspect of the state of a real stand-alone object. Hair color is PAC of a person. A date is an PAC of an event. A role player is PAC of an event. An occupation is a PAC of a person. PACs are naturally recursive and can have their own sub-PACs. PACs can have dates associated with them.

Person-Name Elements

Date & Time Elements

This also encompasses the use of different calendar systems.


ttwetmore 2010-11-16T04:35:01-08:00
What Do We Call Them Things?
Looking through the pages and discussions on the BetterGedcom Wiki we are treated to the wonderful robustness and redundancy of the English language.

For example we are are discussing the creation of a data model. Does a data model define classes? Or types?

When one has either a class or type and wants to talk about something that is one of them, is that thing an object, an entity, an element, an item, a record, or what?

In my opinion, for the purposes of BetterGedcom, class and type are synonyms. And entity and object are synonyms. Element is also a synonym, but its use could be confused with an XML element, and since XML is often brought up in the BetterGedcom context it would be better not to use it. Item is just too general a term to embrace. Record is almost a good synonym expect that the term record is usually used to refer to a complete entity in a database. Since most objects/elements/entities/items from a BetterGedcom data model should someday be stored in a database, record seems okay.

But something that argues about using the term record is that BetterGedcom will likely end up with "top level" elements/objects like Persons, Events, Sources, etc., but also with "interior" elements/objects like Names, Dates, Roles, which, though objects in their own right, are only found inside top level elements so would never be database records in their own right.

But I guess it would still be okay to use record as a synonym for the top level objects since they are certainly intended to be used as database records some day.

So I ask again, "What do call them things?" I would propose class for the first, and object or entity for the second, allowing record to be used for the top level entities.

Tom Wetmore
greglamberson 2010-11-16T05:42:56-08:00
Really, I think each person as they discuss these issues is going to have their own definitions, and they must define these items within their writing for now. For the main pages, we'll have to define these items within the page.
GeneJ 2010-11-16T11:32:14-08:00
How about:

and sometimes ...Roles
GeneJ 2010-11-16T11:32:14-08:00
How about:

and sometimes ...Roles
AdrianB38 2010-11-18T09:31:20-08:00
"Data Model" has specific meaning within IT and there are specific terms to go with it.

1. A Data Model defines
(a) "Entity Types" (the things about which we talk).
(b) Attributes (the properties - in non-IT language - of the entities).
(c) Relationships between the entity types.

"Entity" and "Entity Type" are a bit flakey and Entity is sometimes used vice Entity Type, but strictly the overall type is the "Entity Type", and the individuals are the Entities (e.g. "Entity Type" = spaceship, "Entity" = NCC1701E USS Enterprise)

Objects and Classes (in IT speak) belong to Object Oriented work, which is not what we are talking about here. Anyone wishing to write code to produce software to handle BG data might follow an OO approach, but that's their prerogative and need not concern us.

Record tends to be used to refer to the physical files - at this stage (Data Model) we're just doing the theory.
greglamberson 2010-11-18T09:43:27-08:00
Well, guys, who wants to begin defining the entities? Have at it. Please don't feel constrained from editing the main pages. The main pages aren't for conclusions by any means. This is the sort of content that should be sliced and diced on the main pages, I think.
ttwetmore 2010-11-18T14:29:04-08:00

I think we understand that, but for all practical purposes, in almost everybody's minds, entity type (or just type) and class are synonyms, and entity and object (and element and instance and item and record to some) are synonyms. Sounds like you prefer the terms entity types and entities. I can go with that just fine. I just asked the question to provide some stimulus so we would decide!

I'm not sure what all this IT speak is you refer to is. I've been in the IT world (though I always just call it software) for 43 years. Maybe I've developed a bad accent!

And, at least in my mind, we are not just at the data model stage. My mind is circling around all kinds of ideas, file formats, database technologies, container ideas, representation issues, and so on. For me I've been in and out of the data model world for genealogical software for almost 20 years. I would say there is almost no idea about genealogical models, and their realizations I haven't discussed (or implemented!) over the years. I am a proponent of the do it all at once model

Tom Wetmore
GeneJ 2010-11-16T09:30:55-08:00
Key elements of "person" identification
The "Genealogical Summary Paragraph" is the first section of a biography and it's thought the style guides for the journals _Register_ and _NGSQ_ contain information that quickly and accurately identifies most persons. These style guides assume differences in family circumstance and extant records (my words).

Birth event (date, location)[best knowledge of]
Death event (date, location)[best knowledge of]
Parents names
Spouse(s) (birth, death, parents names, recognition of prior spouse as appropriate)

Birth is sometimes as baptism, as it might represent the best knowledge or evidence of the birth event. Death is sometimes as burial, representing the best knowledge or evidence of the death event.

Hope this helps.
ttwetmore 2011-01-07T16:01:26-08:00
I not a fan of ASSO either, but I belive BG should have the feature, which is called "relation" in DeadEnds.

In terms of two way links, the external file format does not generally require them. Redundant information is often removed before writing information to an external archival format. The redundant data can be reconstituted either in the database format or when the data is read from the database during processing.

Like you say you do have to match the name of the relation tag with the pointing direction. In DeadEnds I use the tag that that indicates the relationship of the thing being pointed to to the thing doing the pointing. For example, when a person points to his father the relation is called "father". This is an easy convention. In the witness case, if an event points to a witness the relation on the pointer is "witness". To choose a name for the return link it would have to be something like "witnessIn" or "witnessOf".

You say you think we need two-way roles for the events and groups to/from persons. I agree when the records are in the database or in memory, but not required when the records are just in the BG files. Unless, of course, some of the relations do not have simple inverses; in those cases there might be more to consider. There are many possible internal implementation of two-way links, making their use efficient and quick.

Considering your example of the complex relationships in a family, with a boy being son of father (of family) and step-son of mother (of family). In my way of thinking this is handled by the boy having a pointer to his natural father with role "father", a pointer to his natural mother (if known) with role "mother", and a pointer to his natural father's second wife (let's assume that's who his step mother is) with role "stepMother." This puts all the details outside the family/group responsibility, which makes the most senst to me, since we are talking about direct interpresonal relationships, not relationships mediated through the family object. The roles in the family would then capture the relationship between the child and the family as a whole. Not sure what you'd call that relation, but I suppose a generic "child" relation could cover it.

It could be that in the most pathological cases some info would have to be put in notes. This is something that we all probably believe should be avoided whenever possible, but there will bet times when it is needed.

Tom Wetmore
GeneJ 2011-01-08T08:25:21-08:00
Insightful, as always, posting by Bob Velke this morning on the TMG mailing list about the differences between biological and social contructs in the "parent"-child relationship.


He concludes, "Can we agree that there is a difference between biological relationships
and social relationships (without making a value judgment)? If so, can
we agree that genealogy software should recognize that difference?"
louiskessler 2011-01-08T10:04:39-08:00

See Tamura Jones' defining document about this:

and be sure to visit the links at the bottom of that page as well.
GeneJ 2011-01-08T13:35:26-08:00
Thank you. I've seen Tamura's thoughtful article before.

To the extent that "official genealogy" (as opposed to genealogy software) is describing modern practices, then I'd move "vital records" to "legal genealogy."

At least from my standpoint, modern genealogy practices recognize all the relationships described. That's not to say genealogy software generally enables all those relationships.

From the BCG (Board for Certification of Genealogy) website, "FAQ," in part, "Definition of Genealogy," ... "The advent of genetics has created yet another specialty: genealogists whose expertise lies in the interpretation of DNA results and its application to genealogical research problems."

See also, Curran, Joan F., Madilyn C. Crane, and John H. Wray, edited by Elizabeth Shown Mills, _Numbering Your Genealogy: Basic Systems, Complex Families, and International Kin_, rev. ed. Washington: NGS, 2008.

I'm not an expert on how genealogy software handles the information, I have the impression there are problems.

(1) IF a male has four different fathers, say one each for biological, "legal," adoptive and "step," will that unique child (one person) show as the son of all those fathers?

(2) If there are all these relationships, are they all reported (even though only one might be considered primary at any one time).

(3) How well does software handle multiple lines of descent when those multiple relationship are otherwise represented in the family tree.

The thread to which Bob Velke was responded was a bit different, as issues didn't involve a biological father. The thread involved same sex (female) partners, one of whom was represented as the biological mother of the child. The other parent has no biological connection to the child. (Hence Bob's comment, "difference between biological and social relationships.") The point behind that thread, I understand, is Why doesn't this software recognize 2 parents of the same sex in parenting roles. (As opposed to one parent always male, and the other, always female.) In other words, can a person have two primary "moms" or "dads."
mstransky 2011-01-09T02:09:41-08:00
ALL, Sorry I have been off line for a while.I am getting back into the swing of things.

"If so, can
we agree that genealogy software should recognize that difference?"-GeneJ

I ran into that problem and had decided that a person can only come from a mother and a father. The HUSB and WIFE have been a pedigree marker that captures relations ships automatically for birth parents and step parents without the need to make ASSO to twice tag relations in both directions.

However lets say if test tube births ever happen which could put a twist to that original design.

Ok about adoptions, still that person would have had two unknown parents, mother and father. But people wish to display such info. This is where I decided NOT to create crutches out of the pedigree section of HUSB and WIFE. I decided to display a household view of the person in question. When THAT person is displayed the source events associated to THAT person in question would capture the other so called relations of two Moms, Step parents, legal guardians etc... all with THAT person in question with out creating extra dual paths of ASSO pointing in reverse duplicate paths making redundant extra data.

To me trying to make a square peg fit a round hole is fruitless. I am still looking at keeping the pedigree HUSB and Wife as is!

I incorporated a source record 'ROLE' of people which display those things people are without trying to link extra exhaustive ASSO redundant data.

I think it is a great to capture such relations and display them, but...
1. to get rid of the pedigree design because it does not support two mothers or legal parents is wrong. the pedigree had and still has it purpose and should stay and remain its simplified design.
2. YES, capture legal parents, adoptive, and say same gender parents outside of the pedigree via a source/event record. Source records which capture the roles of such TWO guardians either they be legal, adoptive or same gender is the way to go.

If done in such a way I do it, it would not matter if the relationship is biological or social. These would all show up like source record sets set apart from the pedigree part of the data. Then there is no modifying the FAM and still give a quick view for such other data relations without making extra, extra, extra data for a database to handle it all.

I don't know if anyone else had work arounds?

I guess the simple way to say it is this...
I use an event "ROLE" as the the "relationship to the PERSON in question" without creating the data twice in an ASSO relation tag a second time.

Has anyone else ever done it this way?
brianjd 2011-01-09T08:38:54-08:00
Let us not forget, that surrogate mothers are quite common these days. Women carrying someone else's embryo, AI etc. So it's already complicated.

Of course, there are going to be as many ways of tracking these more unusual cases. Some people will want to keep track of everything, and others will want to track only the intended parents. Plus not every child is part of a HUSB and WIFE family. Lot's of people never get married. Many times one parent is not known. Not that I advocate changing the model, but enhancing it with perhaps a type characteristic. So you could have a HUSB record which has a characteristic TYPE equal to WIFE. Then every kind of family relationship is still captured by the HUSB and WIFE tag, but the tag has a TYPE the defines the displayed relationship type. No need for the ASSOC tag, or rather burying inside of the HUSB and WIFE tags. In effect overriding the normally displayed label. At least for those genealogy programs that eventually offer that ability.

As far as how I track these odd relationships. I have a family with a surrogate in it. While the family keeps in contact with the surrogate, the child is unaware of this relationship. I have therefore created two trees, and in the one with the surrogate as the mother, it is noted in a note the relationship. I know the surrogate mother has some interest in genealogy, so the information will likely reach the child someday, but I don't need to facilitate it, by adding all kinds of complicated records in a single database the child is likely to see. The simple answer would be to simply merge the two trees at the point of the split, as the child will likely always consider the mother-who-raised-her family as her real family, and may come to want to know her natural family too.
AdrianB38 2011-01-09T08:41:17-08:00
Tom - re your use of ASSO. I am intrigued by your advocacy of using it even for quasi-parental relationships. I'd tried to get away from ASSO in my mental view of things, because it seemed to introduce another way unnecessarily.

Can I just see if I understand your view by trying the James P Bruce example from my previous response?

First we have the birth event for James:
- James P Bruce: birth in 1860 in Dundee, role in event of "James P Bruce" is "son", role of "John Bruce" is "father", role of "Margaret Bruce" is "mother";

Then we record the fact that James was effectively a member of his father's 2nd family, viz:
- "James P Bruce" is a member of the family-group "John & Annie Bruce's family" with a role in 1891 of "child"

And am I right to believe that you would then make clear the relationship between James and Annie by additionally having:
- James P Bruce is associated with Annie Bruce as "step-child" (assuming the reverse relationship auto-generates) (with a date???)

Whereas if I followed my idea, I'd have the first 2 'facts' but not the 3rd, leaving my software to deduce that James was a step-son.

And in the Pickstock example, we'd have
- "Arthur Elliott" is a member of the family-group "James & Mary Ann Pickstock's family" with a role of "child"

- Arthur Elliott is associated with James Pickstock as "informally-adopted-child"
- Arthur Elliott is associated with Mary Ann Pickstock as "informally-adopted-child"

The advantage of this is that it avoids that clunky role of "informally-adopted-child" or "child" with sub-role "informally adopted".

What would you do with a formal adoption for which a genuine event exists? Is it sufficient to use the adoption-event, without an ASSO? (That's where I get slightly concerned - that the details of the relationship are held both in ASSO records and Event records. But maybe we justify it by pointing out that for Step and Informal-adoption, there is no event...)

And just to add spice to it, how about the case where a step-child is later formally adopted? I would reckon an ASSO for the step-relationship _with a date range_ followed by an Adoption event. Right?
AdrianB38 2011-01-09T08:52:52-08:00
"not every child is part of a HUSB and WIFE family. Lot's of people never get married"

Absolutely - that's why I'm always careful to say that the roles in the family are "father" and "mother". But even there, I need to emphasise that it's "father" and "mother" in the widest sense of (assumed-)biological OR step OR adoptive.

Actually - I should just say "parent" for the role, then it covers families with same-sex parents. Though "parent" still needs mentally expanding to cover (assumed-)biological OR step OR adoptive.
GeneJ 2011-01-09T10:29:39-08:00
I pulled out _Numbering your genealogy_ this morning, so I'll post information here as I believe the chapter, "Complex Families" by Madilyn Coen Crane is helpful.

She first describes the problem as "Traditional numbering systems were designed to present a group of people, all blood kin, who descend from a single immigrant ancestor ..." Also notes, "Many sensitivities influence a decision whether adoption should be made known to a child or whether it should be recorded in the family chronicle, and most views have merit. When writing of the living, genealogi- cal ethics dictate a respect for the wishes of those involved. When studying past generations, more options are available. Because of the serious genetic issues at stake, as science continues to explore and treat inheritable medical conditions, this paper recommends that adoptions of past eras be treated as frankly as all other aspects of genealogical research."

In the "Solution" she shows how the NGSQ (_Quarterly_) genealogical style guide has been expanded to handle "surname changes, step relationship, and adoptions."

In the spirit of our work, I'll include extended clips from this _2008_ work. I hope more of us will pick up a copy the book, it includes examples of some of these items in practice. The book is available in pdf form for a song and dance at the NGS website.

• An adopted child or stepchild is carried forward with the nuclear family in which he or she was reared ... assigned identification number reflects the child’s position in that household and biological status ....

• When a direct descendant has multiple marriages, these are discussed in chronological order ... within each marital summary, biological children are listed first, then children adopted by the direct descendant, then stepchildren of the direct descendant.

• When an adopted child or stepchild is listed for the direct descendant and spouse, an Arabic number is assigned ... A Roman numeral is not assigned, because the numeral traditionally ....

• The generation number that is assigned to the adopted child depends upon whether the child changes or retains the birth surname.

• If the adopted child uses the adoptive family’s surname, that child will usually be Generation 1, because he or she is the first in that biological line to use that surname.

• A stepchild or an adopted child who retains the birth surname will follow the generation number appropriate to the birth family whose name is preserved.

• In order to maintain a clear identification of biological ancestry, while including adoptions and stepchildren in the family structure, the phrases _adopted by_ and _stepchild of_ are added to the parenthetical summaries of descent.

[end clips]

Hope this helps. --GJ
ttwetmore 2011-01-09T10:31:33-08:00


In an attempt to answer your questions ... you pose some good examples. First a couple of principles I espouse ...

ASSO (which should have a better tag/name in Better GEDCOM - say relation) is a way to express ANY relationship between two persons. Whether the relationships should be two way I'm not touching here. Expressing the exact relationship has been under debate here. Some like lots of detailed tags covering as many cases as possible. Others opt for what I've called the "genus, species" technique of a generic term modifed by a specific one, e.g., (father, natural).

That one was easy. The second issue boils down to answering the question of what is a family? This is a very thorny issue as can be seen in discussions that pop up all over the place, not just in many areas in the Better GEDCOM wiki, but in blogs and mail groups and lots of other places.

The GEDCOM philosophy is that a family by default is a married mother and father and their natural born children. The Mormons seem to have chosen to ignore or hide the nature of many real families. They did admit somewhere that HUSB and WIFE didn't really have to be married, but it was pretty grudging. But they do allow tags to modify the child roles also.

But Better GEDCOM arguments seem to heading for the idea family is really a "just" a group of people who live together. There often is a male figure playing the father role, a female figure playing the mother role and a bunch of kiddies playing child roles. These roles in the family can be the same as the natural biological roles or they can be very different. We all know these things, so it's probably pointless to point out the myriad of examples that exist, yours being a good one.

So the big question in my mind is how should we capture the different types of roles, natural/biological, official, legal, etc, and where do we put them in the model. I don't have magic answers. Like most genealogists my overall goal is to properly elucidate biological relationships, while also recording and tracking all other relationships that I find. Like most genealogists I assume that official relationships are biological relationships until proven differently. Keep an open mind but don't loss sleep.

I think the ASSO/relation approach can be used to show any relationship we want to express, and I mean any. And don't forget the roles that are used to connect events with their role players. It is these roles that often are closest to the evidence and have the best surety.

If we use the family object I SOMETIMES think that there should be a basic assumption that UNMODIFIED roles of FATH, MOTH, HUSB, WIFE, CHID should be assumed to be the biological and official relationships, and that all other relationships should be indicated with role modifiers. This is rather awkward in the case of step-children where there is one natural and one step-parent. But not horribly awkward.

BUT I ALSO SOMETIMES think that the roles in a family should really be just be a generic/social role. For example a step-child could just be a CHIL in the family, where all this means is that the person is playing a child social role independent of his/her exactly biological/legal realationships with the parents or others in the family. This is a subtle concept and I always try to refrain from bringing up subtle concepts because they can cause long tangents when people don't fully understand what I'm trying to say, which is often, because of my poor capability to convey complex ideas. What I'm suggesting is that maybe roles in families should not convey biological connotations at all, just social relationships. But I doubt there'd be much agreement out there about that.

I would probably do your example as follows...

First the real birth event from let's assume an offical record...

We have the birth event with three persons created from it. The event record points to the three persons in the roles of child, father, and mother. We assume, because we don't further modify those roles that we assume a natural biological relationship between the three. We DON'T add ASSO/relation tags between pairs of these persons, because the event role tags have fully captured the realtionships. We also don't immediately create a three person family reocrd from this event, even though that is the knee jerk reaction that most genealogists might have. (As a quick aside, this is one of the reasons why I think multi-even roles are so important -- they are extracted directly from evidence and they unambiguosly handle the relationships mentioned in the evidence -- all records and all relationships stem from the exact same evidence and form a unified whole of information).

Then you have your first family record. Assume this comes from a census record where James is mentioned as son. If this is where the record comes from we don't even know for sure whether James is a step son of Alice or not. Probably, but there is nothing in the evidence that says so. Like I said above I don't know what our final decision on roles in families will be. If we are allowed to express that James is a step-son of Alice in the child role in the family, then I would say there is no need for an extra ASSO/relation between James and Alice to document the step relationship. If we decide that family roles must be social roles only, then we could add the extra ASSO/relation between James and Alice to show the true nature of their relationship. But you are right in your implication, I believe, that the step relationship can actually be inferred by software by simply noting there is a biological parent relationship between James and Margaret (from the event record) and a social parent relationship between James and Alice (from the census record). So the ASSO is not really needed.

That's a long way of saying I don't know the right answer, but I do know the answer depends on exactly what we want relations in family objects to mean.

I think adoption is similar. And yes, I would have an adoption event in the database with the roles to the adoptive parents and adopted child. And similar to the step child case, there would be no need for ASSO/relations between the persons already covered by the event roles, and the child relation in the family could still be just the social "child" role since the adoptive details were already captured by the event.

There are software implications to all this, of course. Software has to evalute all the relationships it finds people in (there are three kinds let's assume, relations established via events, relations to one another via ASSO/relation tags, and relations from family/group records). Software is going to have to evaluate what these relationships mean with respect to one another. For a software geek like me this is is actually a lot of fun, and I've written code to explore some of these ideas.

We also have to remember that a family is a conclusion object most of the time and this has some implications on the nature of the relations that stem from it. Exactly what those implications are don't seem entirely clear.

Tom Wetmore
louiskessler 2011-01-09T11:56:30-08:00

One other thing to consider is that the lineage-linkage concept is something that should remain sacred. That is at the most simplistic level that parents are linked to children, and children to parents.

Now that does get complicated by whether they are biological or social parent/child relationships, but the key thing is that this link be here. This is because genealogy software draws pedigrees and descendant trees and some of them do relative trees.

Now GEDCOM has added the family in between as a container for those links. The HUSB and WIFE tags have always problematic to me because of same-sex couples.

I don't think the family record is essential in BetterGEDCOM and whether to have a group, a family, just ASSO tags, or some other connector is up to what this group finds best.
AdrianB38 2011-01-09T14:18:11-08:00
Tom - thanks for your thoughts. I think the _possible_ concept of family roles being social roles is a useful thing to think about because it gives some clear words to play about with and to see if we can get meaningful explanations with. I don't want families (or roles!) to turn into the ASSERTIONs of BG!

And Louis - thanks for reminding me of the need to concoct lineages. I hadn't forgotten it, but it gives a useful test - does this method give easy access to lineages of biological AND / OR social form?
GeneJ 2010-11-16T11:25:54-08:00
oops ... forgot "marriage."

Birth event (date, location)[best knowledge of]
Death event (date, location)[best knowledge of]
Parents names
Spouse(s) (birth, death, parents names, recognition of prior spouse as appropriate)

Birth is sometimes as baptism, as it might represent the best knowledge or evidence of the birth event. Death is sometimes as burial, representing the best knowledge or evidence of the death event.
paulzag 2011-01-07T00:28:52-08:00
Watch out for gender binary bias. While most genealogy can be divided into male and female it is not always cut and dried.

A modern example:
If my gay cousins adopt children, those children are as much part of my tree as children adopted by my heterosexual cousins. This is even trickier if one of the couple is the biological parent of the child.

An older example:
paulzag 2011-01-07T00:35:34-08:00
Hmm I can't edit a post.

The gender field will become something like the "Race" field I've seen on some immigration records.
50 years ago it was normal
20 years ago it was strange
Now it is meaningless. If you have check boxes, what race is Barack Obama?
AdrianB38 2011-01-07T04:37:38-08:00
Thanks Paul - nice to have the concrete example of hijras.

Re adopted children - this is one of the things several of us are pushing for. See http://bettergedcom.wikispaces.com/Shortcomings+Of+GEDCOM "I Want My Genealogy Software And BetterGEDCOM To Do This" and in particular, "Distinguish between a group of people who live as a family (who might include informally adopted children) and biological or step-biological children who don't live with their biological or step-biological parents".

I _personally_ see Family in BG as a group of people, each of whom has a role against them in that group. These roles could cover all sorts of things like parent, adoptive-child or step-child. In addition, that membership is separate from events like "birth" or "adoption" allowing the separate recording of birth / adoption from family membership.

Not sure I know all the combinations of those values but I feel that the 2 concepts need to be split to allow families to be driven by other than biology or formal adoption.
GeneJ 2011-01-07T08:57:02-08:00
Note the "inclusion" styled language in the definition of genealogy given on the website for BCG (Board for Certification of Genealogists):



5. Question: What is genealogy, exactly? What tools or materials do genealogists use? Do genealogists specialize?

Answer: Genealogy is the study of families in genetic and historical context. Within that framework, it is the study of the people who compose a family and the relationships among them. At the individual level, it is biography, because we must reconstruct each individual life in order to separate each person’s identity from that of others bearing the same name. Beyond this, many researchers also find that genealogy is a study of communities because kinship networks have long been the threads that create the fabric of each community’s social life, politics, and economy.

Good genealogists use every resource and tool available, emphasizing original records created by informants with firsthand information. Genealogists have long studied economics, geography, law, politics, religion, and society in order to properly interpret records, identify individuals and relationships correctly, and place their families in historical context. The modern field of genetics has added another valuable tool to their intellectual toolbox.

Serious genealogists do specialize, as do all professional and scholarly fields, because no one can be an authority in all aspects of any subject. Some genealogists specialize in an ethnic group, some in a geographic region, and some in a particular type of resource such as military or immigration records. Some specialize in work with the legal system, others in medical research. The advent of genetics has created yet another specialty: genealogists whose expertise lies in the interpretation of DNA results and its application to genealogical research problems.
GeneJ 2011-01-07T09:20:27-08:00
Ooops. That sent a little fast.

Adrian wrote, "Distinguish between a group of people who live as a family (who might include informally adopted children) and biological or step-biological children who don't live with their biological or step-biological parents."

Modern genealogy recognizes the array of relationships that make up a family.

When we move beyond better known relationships (better known as biological, adoption, foster, step, ward/guardian) it becomes less clear to me which relationships do or do not constitute recognition of a "link" in relations to a genealogical numbering system.
AdrianB38 2011-01-07T11:17:52-08:00
"When we move beyond better known relationships ... it becomes less clear to me which relationships do or do not constitute recognition of a "link" in relations to a genealogical numbering system."

The answer is surely - whatever "I" say. As I hope, "modern genealogy" recognises".

I have seen (or at least, I think I have seen), people try to tell an adopted child that they should not be tracing their adoptive ancestry. Utter nonsense. If someone thinks a group constitute a family, then whether there is a specific relationship whose definition is agreed in the OED or Wikipedia or not - they are _A_ family (and linked). I have a feeling that a family is just a micro-community. That's one reason why I want to move us away from a family that is defined by biology or step- or formal adoption to one that is just... whoever I want.
ttwetmore 2011-01-07T12:10:20-08:00
There are two ways genealogical programs can represent links/relationships between persons that I am aware of ...

The first way is to have a person record link directly to another person record with a role tag that describes the relationship. There would normally be a reciprocal relationship/link in the other direction. The ASSO tag in GEDCOM accomplishes this. Better GEDCOM would surely allow the same, though I hope would change the name of the tag. The role tag specifies the relationship. There could be lots of unique role tags, or there could be a "genus, species" approach. Here are examples of both...

1 ASSO @I45@
2 ROLE Stepfather

1 ASSO @I45@
2 ROLE Father <-- "genus" of the relationship
3 TYPE Step <-- "species" of the relationship

The second way of linking is through a third record that is not a person record and that binds the role players together. The most well-known of these types of records is the GEDCOM family record that has pointers named HUSB, WIFE and CHIL that point to the corresponding role players. Better GEDCOM thinking seems to be to generalize the GEDCOM family record with a group record which could bind persons together by other kinds of relationships.

The big difference between the two techniques is that in the first the person records know directly how they are related to each other and the relationships are direct inter-personal relationships. In the second case the relationships are between the group and the persons, so the relationships between the people are "indirect." For example if a family record points to one person with role "father" and to another person with role "child," these roles are with respect to the family. It is a leap of faith or at least a leap of inference to say that the father role player from the family is the father of the child role player. Subtle but important.

We should also mention event records. Event records describe events of "genealogical significance" that occur in the lives of persons. Many of these significant events are the events that establish or modify relationships between persons. For example a birth event generally includes at least three role players, father, mother and child, and the event establishes the biological parent/child relationships between the persons involved. The birth event also serves to confirm the "couple" relationship between the parents. In models that allow multi-role event records we have the same situation that occurs in family records. The roles document a relationship between the event and the persons and we have to infer the actual interpersonal relationships that are implied.

How does this play against the "there should be only one way to show information" principle that some of the BG workers are struggling with? What do I mean? Imagine you were the author of a genealogical application implementing a BG model-based database that allowed both of these kinds of ways to record interpersonal relationships, and you were working on the software for finding the father of a person in the database. You would have to know whether that person had any direct person to person links to another person with the role tag father, but you'd also have to check whether there were any family or event records that also serve establish an indirect relationship to another person with the father role.

Personally I believe the BG genealogical data model must allow both these ways of recording relationships. The reason for this is that our genealogical research with determine relationships through a variety of ways, some more appropriate for recording in one way and some in the the other. This has all do to with the evidence gathering process. Evidence of one kind is best dealt with as directly establishing person records with direct links between them. Evidence, particularly and evidence that documents events, should be recorded through the method of events with role-playing persons.

I don't see any issue at all with specifying the actual roles in either way. They can be step, half, natural, adopted, foster, and easily specified, probably best through the "genus, species" technique.

Tom Wetmore
AdrianB38 2011-01-07T14:13:19-08:00
Personally, I'm no fan of the ASSO link for several reasons:
- the app I use doesn't expose it very easily
- the need for setting up the link in both directions thus creating redundant data (so tell me when a stepfather doesn't have a stepchild in the opposite direction?)
- the complete lack of clarity in the standard about how I'm supposed to read it (if A is associated to B with "witness" is A a witness to B's event or vice versa? This suggests that 80% will use it one way and 20% the other!)

As for the group / family role and the event role, I suggest we need both. Err - is this me, the hater of duplication, really saying this? Well, yes. Gulp. We need the role in the group / family
(a) for ease of mapping on import of a GEDCOM and
(b) since there must be cases where there is no event that _naturally_ helps us record the relationship.

For example consider the 2 examples:

- "Robert Bruce" is a member of the group "Balfour, Guthrie" from 1907 to 1909, with a role of "partner". Or...
- "Arthur Elliott" is a member of the family-group "James & Mary Ann Pickstock's family" with a role in 1911 of "informally-adopted-child". Or a role of "child", sub-role "informally adopted".

Note that it is possible to write the first as an occupation (though I'd contend that his occupation was import-export merchant from 1869 to 1909. "Partner" was his role within the organisation just for those 2 years. Equally, one could have occupation "soldier" from 1914 to 1918 and role in group "Cheshire Regiment" that of "sergeant" from 1917 to 1919)

Note 2 - unlike the first example, I have no event to record "informally-adoption" since there is _no_ such event. One can be formally adopted (which didn't start in the UK until later) or nothing.

Having written the above 2 in terms of their roles in a family or group, if I consider the membership:
- "James P Bruce" is a member of the family-group "John & Annie Bruce's family" with a role in 1891 of "child"
then I start thrashing around. James was the biological son of John's first marriage and the step-son of Annie. But there is no sensible role that reads "son of father and step-son of mother" (i.e. you could have such a role but for goodness sake!!!) I would rather just have the event:
- birth in 1860 in Dundee, role of "James P Bruce" is "son", role of "John Bruce" is "father", role of "Margaret Bruce" is "mother".

There is no event for the link between James and Annie since there is _no_ known event when he becomes a genuine member of the family. He does, because he eventually lives with them, unlike his sisters who may be step-daughters of Annie but are not (so far as I ever see) in any real sense members of the John & Annie family, unlike James who eventually lives with his father and step-mother - but I've no idea when.

Thus, the group / family role is subtly different from the event-role and not - unless we really create absurd definitions - the same value. Hence we need both.

Now, I have a distinct feeling that I could create events and put the roles against the events without having any roles against the group but (a) some of those events will be dummy events created simply to record the role and (b) some of those roles won't match any English term and thus will be prone to misuse.
AdrianB38 2011-01-07T14:17:22-08:00
PS - if Tom creates software that generates all the above on the fly from source data, he will have different needs for roles. I am talking from the viewpoint of explicitly recording my conclusions in BG, while citing the sources.
AdrianB38 2010-11-18T09:51:54-08:00
Need for Family Entity?
The page asks "Family entity - Is this needed or desired???"

I'm trying to draw things out myself to get them straight and my conclusion is that what we need is an Group entity type.

My notes to myself say:
"The GROUP-HISTORY entity type
"This entity type represents the current state of research into the history of a group of people, whether formally constituted or not. This is intended to cover any group of people where it makes sense to talk about the group in its own right, e.g. a family of individuals, a regiment, a business partnership, a school of artists. A group of somebody's friends would seldom qualify as a GROUP-HISTORY entity since it is unlikely to have any existence in its own right."

Thus, mapping from GEDCOM to BG would see the Family mapped onto the GROUP-HISTORY. Mapping the other way could be done by selecting only GROUP-HISTORYs of type "Family".

There would be relationships between the deduced history of individuals and GROUP-HISTORYs (many to many, optionality at both ends) and the relationship would have attributes to define the role that the person played within the group. (So, mapping from GEDCOM to BG again, that's where the Husband or Wife bit gets used)

HOWEVER - in addition to this, the Event of birth (or adoption) would (in my model) relate child to parents, again with attributes describing the link. Thus we have a firm basis for saying whether or not someone is a birth-child of the father and a step-child of the mother, e.g., rather than just simply, a member of two families.

NB - my use of GEDCOM terminology may be a bit flaky as I'm referring by memory only at this point.
AdrianB38 2010-11-25T10:54:17-08:00
While we can debate the ease of reconstructing family relationships from events or vice versa, I think this misses the point.

We were debating whether or not we need a Family entity. My opinion, stated at the start, was that there is a clear and definite need for a Group entity. No-one seems to have disputed that need. The relations between Person entity and Group entity must include a role. No-one seems to have disputed that either.

That being so, we may as well have a Family as a Group with a subtype of "Family", and therefore the relations between a Person entity and a Group entity that happens to be a Family subtype must include a role - i.e. "father", "mother", "child", etc. This does NOT obviate the need for recording the precise relationship on a birth event since in a family, someone might actually be "biological child of father, step-child of mother" or "biological mother to some, step-mother to others". The event would record the precise relationship - the Group a rough value. It is then up to the software designers which they use when and where.
hrworth 2010-11-25T11:15:17-08:00

If you are just looking at the "Name" attribute of the person, you might be right. But, aren't there other "attributes" that helps define that Person? Dates and Relationships come to mind.

To me, the application that presents the "person" has to generate a Different Attribute for the 2 Unknown's in the example.

I am trying to point to my observations, that some work has to be done by the sending and receiving application.

The Sending Application has a better definition of that person, (dates and relationship) then the BetterGEDCOM, so it should identify the Unknowns differently. The Receiving application should then be able to reconstruct what the sending application had.

OH, by the way, the Unknown, might in fact be the SAME person.

Example: Marriage to Unknown, Divorce, Marriage to Known, Divorce or death, Marriage to Unknown (the first one).

The sending application would know the other attributes for that Unknown person, and indicate that, to the BetterGEDCOM, using the SAME INDI ID (or what ever the peron ID will be called) indicator for that person.

testuser42 2010-11-27T13:54:09-08:00
Russ, I agree. I think that the two "unknowns" would be automatically created as two different "evidence persons" as you enter the events they have a role in. There would be only very little information about them in these "evidence persons", but the fact that they are mentioned in a source and share an event with a research subject makes them relevant.

Then, after finding out more about the unknowns, there will be additional sources that produce new evidence persons which then can be linked together in a "conclusion person".
Later, one might find that two conclusion persons are the same real person, so you'd either merge these conclusions, or (probably better) join them under a new conclusion person.

If you want to make an assumption early on, say, that the two marriages were to the same person, then that's fine. You will be having reasons for believing that, which you should put into the new conclusion person that links the evidence you have.

So what about "groups of type family" in these cases?

I can now think of three ways a family-group would get created:
- by a marriage (or similar)
- by a birth
- manually by the user

I'm thinking that family-groups could build up tree-like, similar to how a conclusion person is built from previous conclusions and evidence persons.
A source telling of a marriage would be evidence of an event, which would create an "evidence family". Another source might tell of a birth to the couple. This will generate a new evidence family, which will be joined with the old one by a "conclusion family".

Does that make sense?
Or should a family be only created once and changed / amended if new information is available?
hrworth 2010-11-27T15:08:16-08:00

First, I don't understand the terms that you are using.

Evidence Person
Conclusion Person

I have a PERSON, I gather information about that person. Events, Fact, Stories, from various and sundry sources. Each piece of information is put into my file with the appropriate "evidence", in the form of some documentation. (won't use the term citation).

Why do we have to have a term other that PERSON. That person, WILL change over time as new pieces of information is added. The new evidence will change the characteristics of that PERSON.

That PERSON will have relationships with other PERSONs. That that Relationship is, may or may not be a "family" as is usually defined. Those relationships, may or may NOT have any evidence.

No evidence family, no conclusion family, but Relationships. Oh, and those relationships may or may not change. In fact, they will probably change, and without any evidence.

Isn't Genealogy about Relationships, or is it about 'families'?

In my humble opinion, we don't need to create new terms to move this project along.

What am I missing?

testuser42 2010-11-27T15:37:05-08:00
Hi Russ,
I think I'm using "evidence person" and "conclusion person" in the sense that's been used by Tom and others.
Both are recorded in a BG using <person> or something like that. A <person> element in the BG that is based on only one source (=piece of evidence) is an "evidence person".
A <person> that combines several other <persons> is a conclusion person or hypothesis person (because you concluded / hypothesized that different pieces of evidence concern the same real life person). The <person> on the top of a whole tree of othere <persons> will be/represent the most recent hypothesis/conclusion. This would be the person you see first in your geneaology software.

That's pretty much how I understood other people's use of these terms. Only now have I read the thread where good arguments were made for using the word "hypothesis" instead of "conclusion". So I'll try and use that in future.

Of course there's only one final PERSON. But it is a representation made by hypothesis out of earlier hypotheses and analyses of various pieces of evidence.

I didn't try to create new terms, except for the combination of "evidence" and "conclusion" with "families". "Family" being short for a <group type="family"> or something similar. And evidence and conclusion used as above. All used together to build a tree of <group type="family">s.

If this invention of mine makes any sense at all I don't know :-) I was trying to be consistent handling evidence and hypotheses concerning families (any groups, actually) the same as evidence and hypotheses concerning persons.
Maybe it is in fact nonsense -- I'm really not sure, that's why I put it out for discussion.
hrworth 2010-11-27T15:49:39-08:00

I know you are. I just don't understand why we need to invent new terms for information.

Secondly, NONE of this is "recorded in a BG". Data is TRANSPORTED using the BetterGEDCOM technology, what ever that might be.

What information is presented about a PERSON, from an application to be transported to another application, that Data and each data element that WE define, gets packaged up and sent on it's way. At the other end, that Data is then broken back out.

Where is this "tree" <group type=family> created? In the BetterGEDCOM or the application.

X has a relationship to Y

X has a relationship to Z

X has a relationship to ....

The application then determines that X to Y is father and spouse. X to Z is father to son. If there is no relationship between Y and Z, what does that mean, other than no relationship (at this point in time).

What is a "family"?

Now, if X and Y had a Shared Event called a Marriage, you could assume that Y does have a relationship to Z. Without some evidence, the relationship between Y and Z is unknown.

What is a "family"?

There are three people (persons). Some of the relationships have been defined, and some not.

The Application might present X, Y, and Z, as a family. The sending of that information does not really 'care' if it's a family or not. The BetterGEDCOM is NOT presenting any data to the End User.

testuser42 2010-11-28T07:12:01-08:00

"NONE of this is "recorded in a BG". Data is TRANSPORTED using the BetterGEDCOM technology, what ever that might be."
- But to transport data it needs to be put into a computer file, doesn't it? This is the "BG", whatever it will be in the end. (probably an container format holding a "bg-xml" or similar, and the media files and so on).
I don't care about what the software uses as a database, though I would love a software that uses BG internally, too.

"Where is this "tree" <group type=family> created? In the BetterGEDCOM or the application."
- ideally, in both. (see above)

About relationships making "family"-objects useless:
I did start out thinking we don't need "families" in BG. I thought it would be enough to have relationships in the BG that the application then parses to construct a family to show the user (if he wants to see a family).

But Tom and Adrian say it won't be possible to "derive the family for reports, etc. from lower level events". I believe they are the experts on the programming side.
Add in the "traditional" mindset of many amateur geneaologists that revolves about family, and you have good reasons to keep a "family" object in the BG structure.
IF it were easy for the software to do that job recursively, then I'm all for getting rid of "family"-objects.

I do agree with you that it is hard to define a family by today's standard.
I even think it's not necessary to use that term in geneaology at all. I'm not using any concept of family in my geneaology, I just look at people and their relationships. I am angry about one software I use, because it only accounts for biological parents. I would want to show the adoptive parents in some cases.

Still, I guess to accomodate the large number of "traditionalists" we should leave a "family"-object in BG. I don't think it would hurt much, and it might be useful for easy import of GEDCOM, or easy report generation.

Or do you think we will be able to get hobby geneaologists away from the idea of "families"? That would be great!

Maybe we're wrong and hobby geneaologists don't care about "families" so much. I guess users generally don't care about the inner workings of their software and their GEDCOM. They do care about the presentation of their ancestry on screen and on paper reports.
Can we give them the reports they are used to without using the "family" object that seems to be necessary for that in today's GEDCOM and databases?
mstransky 2010-11-28T07:54:03-08:00
testuser42, reading the whole post I have adopeted children in my fmily also, some with known parents, some not. But on a historical level when a person does say a research or presentation on PersonX, "X never knew his parents, but were raised in q&p home when ...."
something to the effect writing a book and capturing the UNKNOWN storyline like. Also "X was a witness at J.F. Kennedy's wedding" it is the people that touched X's life but was not a direct relation to the sources. Kind of like a forest Gump moving, X'x influence that was shared.

In my model I can grab the events of X being near others source documents. I am thinking of a way to....., Actually I can my model does grab all the events/participated that X was in/near and inviloved with as long as there is a source/document/book snippit and on the app side display all the items in chronological order.

If when the BG model comes about in physical form that is one format that I can maybe have some input or suggestive why for it to be done if it does not do it.
hrworth 2010-11-28T08:24:18-08:00

Yes. There is an application that I use. Hopefully, that application with provide the tools to generate a BetterGEDCOM compliant file. The Transport sends the BetterGEDCOM compliant file to another EndUser. The receiving application would have tools to extract that data and present that date to the EndUser.

Perhaps I should not have used the word "recorded". I meant it as a 'BG' thing in the cloud between the two applications.

gthorud 2010-11-28T15:30:27-08:00
I tend to agree with those that want to get rid of the family entity. I think it is better to use the relations between persons implied by the birth, marriage and other events. However, for reasons of backwards compatibility, and the abilities to associate e.g. a family photo with a family, it is necessary to have a group of type family. I don’t think that any user will care if there is a special entity type in the BG file called family, or if it is a group, so a group should suffice.

The reason I primarily want to have direct person to person entities is for example that I find it strange that I have to add a family if I only know one parent. Also, if the relation between the child and a parent has a “surety” which is different from that of the other parent, a family entity can’t handle it. Similarly, if one parent is a biological and the other has adopted the child, the single family-child relation is not capable of expressing that.

A receiver of a BG-file will have to handle a situation where the child-parent relation is encoded through events only, through a family group only – or unfortunately – a mix of both.
brianjd 2010-12-03T11:41:57-08:00
I am of two minds on this. I see no need for a family entity from a data perspective.

But, on the other hand for report building and searching, I am certain, I would write code to do exactly that type of thing. The family/group entity would be very handy from an application standpoint.

As for creating a family when there is only one parent, a parent and child are a family, so it makes sense.

I think a group entity would be sufficient, then families are easily report from the group of type "family". So my vote is to support families, but using a group entity, to make it generic. The less entities we can make the better.
mstransky 2010-12-04T20:04:47-08:00
I have a few ways to reach a few solutions. Forgive me if I might not use the same terms but will try.

1.a Capture the family Household of father Mother group as pedigree by real parents.
1.b Also capture group households like foster child and/or parents extended relatives.

2.a Also land Groupings for areas, name changes and splits and combines, gps.

3a. Being able to Group Locals and Persons together for a actual events that took place at a time and date.
3b. or a span of dates.

What are the goals wanted some, none, or all?,
I can see a way of doing this quite easy, If the push is in GEDCOM I can give a gedcom falt file concept example, but prefer xml concept better.
hrworth 2010-11-19T09:34:08-08:00

I agree with your Conclusion, but would add

At this point in time.

Reason: I find new evidence that redefines the 'family' at that point in time. I would the review and perhaps change my conclusion, pointing to a new point in time.

My best example is my 6th Great-Grandfather. In that family using, some said that there were 6 children. Later research found that there was a 7th child, the name was the same as the 6th child, but the 6th child dies in child birth.

The first conclusion, based on evidence, the family had 6 children. After the new information that conclusion changed, adding the 7th.

All that to suggest a point in time, and a suggestion that the conclusion may (will) change over time.

I am not sure if that gets into your Layers of Conclusions.

cowe 2010-11-24T08:00:15-08:00
I would strongly argue against a family entity.

An example: You have a family with a father (A), a mother (B) and a child (C). Then you find a person D which is also the son of A. How do you record it? You have two options: 1. You add it to the existing family, or 2. you create a new family with an unknown mother. Whatever you choose you have created undocumented or even false information. In the first case you have created an undocumented (or false) mother-child relationship. In the second case you have created a second family possibly making others believe person A was married twice or had children with two different women. Again, this is at best undocumented, in worst case false.

Instead of families I would use simple parent-child relationships. Don't record more than you know. It would be the application's task to combine all those relationships into families if required.

Even in a single family all relationships may have different documentation and different certainty. And there may even be different types of relationships like biological relationships or adoptions. It simply doesn't make sense to combine all of this into a single family entity.

Families defined as a group could perhaps be useful in other ways though, as a definition of a nuclear family. A nuclear family is, after all, just a group of people living together. Their biological relationships are a different matter.

mstransky 2010-11-24T08:15:23-08:00
They might be another way too include members of a household seprate from the nuclear family.

Nuclear family - father, mother, children.

make it like a source document like a census record.

Let me go on a tangant to include abstract uses.

say for adoption or "out there" harrums? don know how to spell it?

*I don't like this time line example*
Ok like census have time lines and have nuclear families and list even Cosin or uncle in the household, and the other odd one could be chief with four wives and 22 kids.

What you could do is view an event like House hold which is like an unoffical census at a point in time.

Make a GroupingID that when you select a individual he referances a particular sourDOC
that source doc "census or unoffical one" will select and display all individauls ossociated with it
- indi@988@
- indi@958@
- indi@968@
- indi@77@
- indi@354@

and FAM@ @ would be something that stays the way it is for pedigree and navigation purposes.
testuser42 2010-11-24T10:09:30-08:00
Christoffer, yes, that's a very good argument. We should not add another complication that will lead to wrong interpretations.
Families can be constructed out of links by the software, if a family display is wanted.

But I still would like to have a group entity. And if there are groups, there will be a "type=family" for groups. If there is none in the standard, people will define it on their own.

So how to make sure that these badly constructed families don't happen?
Maybe it'll sort itself out. If people add a family group (=conclusion) and have the sources (=evidence) to prove it, then fine.
If they can't back up their claim, that's no problem either. Since there's no proof, no serious researcher will take this claim into their own data.

To sum it up - I guess the software should not put people into "families". If a source mentions a group that is best called a family, a user may enter this as a group of type family, and link the people mentioned in the source.
If the user wants to make a group of type family on his own, he should be able to.

What do you think?
ttwetmore 2010-11-24T10:35:28-08:00
If you get rid of the family object you alienate 99% of all genealogy software users. Their ENTIRE GOALS in life are to establish their ancestors and their families. If you look at any serious genealogical journal you will see that a family is the single most important organizing principal in every article.

I consider the idea of getting rid of the family object as one of those "cute" ideas that should be just put in the dust bin. The argument that you can construct a family on the fly via software from the other records that are in the database, doesn't fly with me at all. How are you going to put in birth order if it can't be derived from the existing data. How are you going to get a person into a family with two parents if you don't have exact evidence about every child's mother? You HAVE to have a family object that is a CONCLUSION object where YOU put in that order, where YOU make the assumption that even though a mother is not mentioned for one of the kids that you believe you know who she was. The family object is a critical conclusion object that must be available to bind together a group of people into families when the low level evidence is not enough.Software reconstruction will simply not hack it. Period.

Tom Wetmore
mstransky 2010-11-24T10:52:47-08:00
I agree with Tom
"you alienate 99% of all genealogy software users"

If one HAS to make a new view of house hold view, do it from SourceIDdocuments and filter those persons events linked to it.

Then you get a filtered sorted list of persons by roll or age. I think that is a xslt or app side function.
AdrianB38 2010-11-24T10:59:52-08:00
Tom - are you OK with replacing Family by the wider concept of GROUP?

Then converting from GEDCOM, every FAMily would be entered into BG format as a GROUP (say, with subtype "Family") and the software could display "Group: Family" or "Family Group" or "Family" as it sees fit.

Like you, I am unconvinced that we can derive the family for reports, etc. from lower level events.
testuser42 2010-11-24T11:05:14-08:00
Tom, I get your point, too.
And I believe in your technical know-how, so if you say "Software reconstruction will simply not hack it" then I believe you.

So the software should add a family (probably after an event like marriage or birth is recorded?) but the user should decide who's in it?

I mentioned the family (always?) being a CONCLUSION... of course this makes it easier!

You have convinced me that it is
a) possible and
b) actually a good idea
to have a "family" entity.

New questions:
Could there be an "evidence family"? What would that be?
And what about other "groups" that aren't families? Do you want to be able to record them? Then we would be back to a group entity with various types. And a "<group type="family">" is just a longer way to say <family>...
AdrianB38 2010-11-24T11:06:53-08:00
Re your discussion about "a person D which is also the son of A". I really don't see the problem with a "new family with an unknown mother". Surely people understand that the mother is unknown and that she therefore could be the same mother as before? Otherwise, I could be really silly and argue that your simple "parent-child relationship", recording only the father, could mislead people into thinking that the child did the biological impossibility of having no mother.

I think we have to credit people with some sense... (Did I really say that?)

Your principle of only entering the minimum known is sound, but I don't think missing out a mother in a family group is adding too much.
cowe 2010-11-25T06:46:27-08:00

I have plenty of examples where a person has been married two or more times and one of the wives is unknown. Such an example can't be distinguished in Gedcom from the case in my example above where it is simply not known who was the mother.

And it is not a problem to build families out of such simple relationships. I have done it. I have no problems importing Gedcom families and convert them to relationships, and neither do I have problems converting relationships to families for Gedcom export. I also do it in reporting.

mstransky 2010-11-25T08:00:58-08:00
In Gedcom the fam block list
Fam, FamH, FamW, Famc, Famc, Famc,
Also each Indi List FamS,

I have use a differant methood to eliminate a number of reoccuring nodes of FamC and eliminate a id tag pointing to a spouse that does not exsist causing a error by not having an actual spouse. and all navigation can be handled and perfom navigation properly.

is this the time to show such an example or does that come later?
AdrianB38 2010-11-25T10:44:50-08:00
Christoffer - re "a person has been married two or more times and one of the wives is unknown. Such an example can't be distinguished in Gedcom from the case in my example above where it is simply not known who was the mother."

Are you saying that in the second case, you know that the mother is one of the known wives, but you don't know which?

If so, I understand your scenario but I find it very difficult to think of any case where anyone could be confident that, in the absence of a known mother, the researcher nonetheless could be certain that the mother was one of the known wives. I suppose it's possible if there's an explicit statement that all the children are legitimate and you're confident that you have found all the wives. However, I think I'd probably just add a note to the child or family to state what was going on - this would make it far clearer than just leaving coded values there.
ttwetmore 2010-11-18T17:33:22-08:00
I agree that a Family is a Group. And I also agree that a Family is a conclusion entity, which is also what a GROUP-HISTORY object is.

I happen to be in the pro-Family camp, so have no problem with a separate entity type for the nuclear family, since the nuclear family is, regardless of what anyone says, one of the most important concepts both in genealogy and in the minds of the people who do genealogy. (And shouldn't a data model truly reflect the mental model of the people the data model is to serve? Huh?). So I would rather there be a class called Family. But I could live with a more general class called Group that could be tagged to indicate the type of the Group.

Tom Wetmore
hrworth 2010-11-19T06:22:24-08:00

Would you describe that 'a conclusion entity' means. To me, a Family is one thing, but the term conclusion means something else to me.

I might have many pieces of evidence as to the make up of a family, but I may also have some conflicting information within that evidence and my not be able to come to a conclusion as to the make up of that family.

Would there also be several layers of a "conclusion". That is the conclusion about the information of an individual, then the "conclusion" that this person was a member of that family, the the "conclusion", based on the evidence, that at a specific point in time, this grouping of people were a family.

Thank you,

ttwetmore 2010-11-19T09:04:24-08:00

I agree with your analysis. To me a conclusion entity is an entity not taken directly from evidence. By and large there is little evidence out there in the world of primary evidence that defines a family. In a sense a census record enumerating a household heads in that direction, but it only enumerates the partial state of a family at a specific point in time, and the roles mentioned are sometimes ambiguous.

When you create a Family record in any genealogical program today, it is a step where you sit down, create the family record, and then bind into it people that already exist in your database (usually anyway). These people are 99.99% of the time conclusion persons (if only for the fact that these are the only kind of persons genealogical programs actually support!), so the record built out of them is a conclusion also. What is that conclusion? Pretty simply it is, "These people whom I have researched and concluded to be real, all have some evidence that they were parents and children, and because I have seen and considered all this evidence and find it compelling, I truly believe they indeed form a nuclear family, and this is what it is."

Yes there is all kinds of evidence behind the family as you say, in the forms of the birth evidence of the children and the marriage evidence of the parents. The family record should reference either all these events or the conclusion events that also cover the same events. I know that sound a bit confusing but I don't think it has to be.

I think there of are layers of conclusions throughout our research. Examples needed. Someday there will be some!

Tom Wetmore
AdrianB38 2010-11-18T10:02:48-08:00
Need for the Event entity?
The page asks: "1.Is the Event entity required by the BetterGedcom model? Is the Gedcom approach adequate, or should an approach like Gramp's be used?"

Beware - we are in danger of confusing two things here.

The page states "Some current genealogical data models (e.g., GEDCOM) do not include the concept of an Event entity, choosing to place event information within the most appropriate ("primary") person record". The bit about placing the event information inside the person (which is an accurate description) defines the physical way in which a GEDCOM file is defined and written. That is a completely separate thing from the DATA MODEL.

Techniques for data modelling do not work well with repeated attributes (e.g. multiple event attributes for one person). Hence the repeated attributes are always split off to form a separate entity in the DATA MODEL. Therefore the data model corresponding to GEDCOM _will_ have a separate entity type for Event, even though it gets sucked back into the Individual when the physical structure of GEDCOM is defined.

(If you really want to get techie....
- no, I doubt this is a good idea.
- I suspect it's only possible because individual to event is a 1 to many in GEDCOM and the relationship from Source _to_ Event isn't physically recorded in GEDCOM - only its opposite)
hrworth 2010-11-18T10:16:59-08:00

Just want to try to understand what an "Event" is.

My understanding is that a Person / Individual will have a number of "Events" that is associated with that Person. There may be some "Events" that are shared with another.

An Event may have some characteristics. That may not be the right term, attributes might be better.

An Event would have a Name, it may and probably does have a Date, it may or may not have a Location, it may or may not have a Description or some additional information, it may or may not have a Source, and IF a Source a Citation. It might be and Individual Event or a Shared Event.

Example: Birth of an Individual.

Birth = Fact Name

Date = Date of the Event

Place = Location, with City/Town, County,
State, County information

Description = Hospital

Source = Birth Certificate

Citation = Details of that Birth Certificate

For this Event, it is NOT shared. (probably a discussion might have to take place about shared or not share with the Parents)

A Marriage Event, would be Shared, similar information, but in the Description Field, a Church Name, a Justice of the Peace, Home, etc would provide the details within that Location.

In this case, being shared, the same information would apply to the couple whose Marriage is being document.

Does this fit in with what you described?

Thank you,

AdrianB38 2010-11-18T11:35:12-08:00
Russ - in essence, yes it does fit in.

Perverse of me to discuss the necessity of the Event entity without defining it first! We might have to further discuss the exact Attributes of the Event entity (that's the correct IT terminology), (e.g. I think birth events should be shared with the parents in some fashion) but we agree on the top-level view of it.
DearMYRTLE 2010-11-18T12:00:09-08:00
I think an event CAN be -- marriages, birth of children.

On the front side, for the end user, the events of the birth of one's children place that parent (at the very least the mother) in a specific place.
ttwetmore 2010-11-18T18:18:10-08:00
I guess this is the thread where I should have sent my definition of an Event. I do have some problems with this Wiki because it is almost impossible for my aged brain to keep straight where all the different discussions are going on, especially since some are far apart in wiki-page-space, but overlapping in topic-space. I just found this thread again after defining Events elsewhere. Sorry.

I will answer the question posed by this thread in my usual blunt fashion.

WE NEED THE EVENT ENTITY. All genealogy begins with finding evidence for events and then using that information as the basis for all conclusions. If we don't record our starting points we don't model our process. If we don't model our process and only want to show our conclusions, we could use Gedcom and get a good night's sleep.

Putting Events into Persons (whether in "record life" or in "model life", as warned about by Adrian) is the wrong thing to do. Events should stand on their own and interrelate with the Persons who play roles in them. Events do not play subservient roles to Persons, which is how most models treat them. Even the Gramps data model, which does keep the Event and Person classes distinct has the Events subservient. (Persons get to choose which Events they play a role in, that is, they "point to the Events" and can change their pointer at will. In Gramps Persons choose their Events, Events don't choose their Persons. Gramps Events are inert entities, they just sit there waiting to be referred to). In a proper evidence model, evidence persons are defined by the same evidence as are the event persons, and the two concepts are co-equal and the entities/records/instances/objects/elements/items/thingys representing the these evidence event and and persons should be permanently bound together at creation time and then for all time through the role relationships. The only discretionary pointing that should be going on is from conclusions persons to evidence persons and conclusion events to evidence events.

Tom Wetmore
hrworth 2010-11-19T06:55:08-08:00

This comes from an End User, trying to share my research with another End User. We may or may not be using the same application for our research.

I am trying to see where the Presentation of the information ends and the transport of that information begins. Then, at the other end, the transportation ends the the Presentation begins.

Trying to keep it simple, in my simple mind, I have a Person in my file, who has Events in their life, and I have the appropriate Evidence to that Event in the Persons life.

Won't get into the Relationships between people, between people and events, nor the reverse.

Aren't we about defining the attributes of the Person, the attributes of an Event, the attributes of the evidence that support the person and event.

Shouldn't the Model define the make up of these three (just trying to understand the concept here) types of information (Person, Event, Evidence). These models breakdown what might be included in that information and to define a link between the pieces.

Then the change for Presentation to Transport would be able to look at the Model and determine what needs to be "sent" from the presentation (what I am looking at) to the Transport (getting the information from me to another users), then at the other end, break the pieces about and put them into the Presentation at the other end.

As I see it, as a User, I may start from a piece of Evidence and create the Person, add Events for that person, and include the Evidence that support that event.

Now, lets say that I don't include any evidence, because I haven't found the complete information yet. But, I still what to share that information.

The Presentation to Transport would just "say" in some form, "no evidence present", but pass that along. At the other end, the "no evidence present" would be displayed the way the receiving application shows Evidence.

It would be up to the software developer to present what they have chosen to present or not present.

The BetterGEDCOM models, I think, should present the options that may or may not be present in the Data Model.

I won't go into the conclusion discussion, as I have bought it up earlier.

Just trying to understand, as a non-technical end user.

AdrianB38 2010-11-18T12:00:32-08:00
Which concept of Person entity?
The main page defines 2 sorts of person entity:
1. "A Person entity that records only the information about a person that can be taken from a single item of evidence. ... This kind of Person entity is sometimes called an evidence entity"
2. "the type that holds all the [deduced] information known about a person. ... This second kind of entity is referred to as a conclusion entity "

The main page then asks: "1. Which concept of Person entity should the BetterGedcom model support?"

In my view, the BG data model must include "conclusion" version of Person, and I suspect also ought to include the "evidence" version in order to support "evidence management". However, I am not sufficiently familiar with evidence management to say much more about the "evidence" version.

My reasons for including the "conclusion" version of Person include (not surprising to anyone who's read my posts) the necessity of converting from GEDCOM to BG. The Individual in there is a "conclusion" version of Person so it's the easiest conversion. It is possible to envisage unravelling the conclusion version to provide evidence version for each source for each GEDCOM attribute or GEDCOM event but it would be tricky. And if it were done, how could one deduce the location and date for each evidence entity if the location came from one source and the date from another? You'd never find that level of detail in machine readable form in a GEDCOM. Further, any notes I had written against a birth event (say) would be conclusions - these could not be split and would have to be copied against each evidence entity - where they would make no sense at all, as they refer to everything.

I have another reason for also wanting to include the "conclusion" version of Person. If we only include the evidence entities, then I find it difficult to envisage that 2 different software programs would come up with factually identical reports on the same "person" - the input data might be the same but the logic to operate on them would be subtly different, leading to different reports.

Also asked on the main page "2.If the BetterGedcom model supports both evidence and conclusion person concepts ... would there have to be two separate kinds of Person entities?"

I believe these are 2 different entity types - the description is different so that's 2 types for me. Six months later, with a fully populated data model, we might realise that the 2 entity types are identical except for a couple of attributes - at that point we could deduce the existence of a "super-type" PERSON which has "sub-types" PERSON-EVIDENCE and PERSON-CONCLUSION but we have to do that work first, not guess now. (Not sure if "super-type" and "sub-type" are the correct terms... Why didn't I keep the notes from that Oracle course when I retired? <grin>)
ttwetmore 2010-11-18T17:46:14-08:00
My questions were rhetorical. It should be no surprise that I have strong opinions about the answers, which are (if not already obvious):

1. Which concept of Person should BG support? Both. If BG models genealogy, it models genealogy, so it seems obvious it must model what genealogists do? Genealogists collect evidence and make conclusions. The problem with conclusion-only models is that they don't model what genealogists do, they only model the end results of what genealogists did.

2. Do we need two classes for the two Person concepts? No. A single class/entity type can do double duty quite well. See the DeadEnds model doc one way how. It's really pretty cute how you can build up Person trees showing your entire history of your conclusions, with justifications at every level, and things are always reversible when changes are required based on new evidence, because no evidence is ever lost. Also see the Combination doc for how this exact thing was done in a real world application. I don't think we need to wait six months to decide.

Tom Wetmore
AdrianB38 2010-11-18T12:33:29-08:00
My list of entity types - the real world entities
OK Greg - you asked so here it is!!!

This is part 1 of my draft for a data model. My approach is to consider just some of the entity types in each part. This part just examines the "real world" and leaves the sources, citations and assertions, etc of genealogy / family history / evidence management until later.

I mention only a few attributes for each.

Entity types:

1. The PERSON-STORY entity type
This entity type represents the current state of research into the history of an individual person. As such, it needs to accommodate alternate possibilities - e.g. alternative birth dates. (Note use of PERSON-STORY as a name to remind us this is the conclusion version of the entity type. As far as the name of the type goes, Person-Conclusion is the wrong number - there's more than 1 Conclusion to say about someone and Person-Conclusions is unhelpful when you want to refer to several entities of this type - what the plural then? Person-Conclusionss?).

2. The GROUP-STORY entity type
This entity type represents the current state of research into the history of a group of people, whether formally constituted or not. This is intended to cover any group of people where it makes sense to talk about the group in its own right, e.g. a family of individuals, a regiment, a business partnership, a school of artists. A group of somebody's friends would seldom qualify as a GROUP-STORY entity since it is unlikely to have any existence in its own right.

3. The LOCATION-STORY entity type
This entity type represents the current state of research into the history of a location. The location might be a town, city, state, country, street, house, church, farm, estate, etc. We need be able to record changes of names, of geographic responsibility, etc.

4. The CHARACTERISTIC entity type
This entity type represents the current information known about a certain characteristic of a single entity. E.g. it contains information about one known name of the person or location, or one physical description, or one occupation of a person, etc.

The CHARACTERISTIC entity type can have several attributes (in a data model sense), viz:
- one mandatory type for that characteristic (e.g. "name", "physical description", "occupation")
- one mandatory value of that characteristic (e.g. a name for the person, coded up somehow; a description of the location; an occupation for the person)
- an optional date / date range during which the characteristic is believed true
- etc

5. The EVENT entity type
This entity type represents the current information known about an event that may have affected or involved one or more individuals, groups and / or locations. E.g. it might contain information about: the San Francisco earthquake; an individual's birth - with details of their parents; the residence of a group of people in a location; a marriage ceremony; the merger of several railroads; the split of a US Territory location into State locations, etc.

Note this is not designed to record comprehensive data about historical events - rather to summarise them in relation to family history.

The EVENT entity type can have several attributes viz:
- one mandatory type for that event (e.g. "disaster", "birth", "residence", etc.
- an optional date / date range during which the event happened;
- etc

A PERSON-STORY may be characterised by 1 or more CHARACTERISTICs
A PERSON-STORY may be involved in 1 or more EVENTs
A PERSON-STORY may be part of 1 or more GROUP-STORYs
(note that the relationship may include an attribute to describe the role in the group, e.g. mother, father, child, director, officer, soldier)

A CHARACTERISTIC must characterise exactly 1 PERSON-STORY
exactly 1 GROUP-STORY
(these 3 are mutually exclusive and only 1 must apply)

An EVENT may involve 1 or more PERSON-STORYs
An EVENT may involve 1 or more LOCATION-STORYs
An EVENT may involve 1 or more GROUP-STORYs
In all 3 cases, the relationship may include an attribute to describe the role in the event. Note further that any single event may involve all 3 possible entity types.
An EVENT may take place at exactly 1 LOCATION-STORY

A GROUP-STORY may be characterised by 1 or more CHARACTERISTICs
A GROUP-STORY may include 1 or more PERSON-STORYs
A GROUP-STORY may be involved in 1 or more EVENTs
(note that the relationship may include an attribute to describe the role in the event, e.g. group-was-present-at, group-was-affected-by)

A LOCATION-STORY may be characterised by 1 or more CHARACTERISTICs
A LOCATION-STORY may be involved in 1 or more EVENTs
A LOCATION-STORY may be the location for 1 or more EVENTs
A LOCATION-STORY may be the location for 1 or more CHARACTERISTICs
greglamberson 2010-11-19T11:11:51-08:00
I think what you two are getting at is there needs to be flexibility in between the geographic place and the description of that place as noted in a reference to it. Personally, I already enter my place information such that a cemetery is considered a geographic place and not merely a descriptive attached to the geographic place which is the town in which the cemetery is located. This meets my needs, but some might prefer to place all that cemetery info in a place description and not have it be part of the geographic location.

On the other hand, my niece who decides to start doing genealogy (in my dreams) may decide she absolutely wants the GPS coordinates of a particular gravesite to be included as its own geographic place and not a mere description, consistently placed in hierarchical relation to the cemetery, town, and so forth of my data.

As Russ, further points out, this all needs to be done without mandating this information be bound into a particular place format. This is an argument that took place in at least one of the GEDCOM update debates in the 1990s. Place information still needs to be transmitted free-form via BetterGEDCOM rather than bound to any particular format. If the originating application allowed this binding to take place, and the user adhered to such a system, then the information needed to rebind or associate that place information with that particular system's geographic identification and mapping system absolutely should be passed on. However, the new system may not use the same system, or the next user may not want to use it, and so BetterGEDCOM should not have any syntax enforcement rules or adherence to any particular systems but should pass on all such associative mapping information faithfully.
AdrianB38 2010-11-19T13:48:11-08:00
Greg - will do as you suggest and load up a sub-page (and sub-sub-page or so!) - I'll need to read up how to do it first.

hrworth 2010-11-19T13:56:02-08:00

Unfortunately or Fortunately I am working from my current genealogy software but trying to identify want is transported in a BetterGEDCOM file / format.

I am trying to take advantage of my current software, as a User, and sometimes its a balance between what I want to see in one of the features and the software wants me to do.

I agree with what your "niece" wants to do. In fact, I have started to do that in my database. The good news, I can put GPS information on a Place. But to do that, I have to Ignore other warnings that my software gives me, for get my place names consistent.

When I choose to Share this information, my software needs to give me with Options on what specific information I want to share.

When I receive information from another researcher, I should be giving an option on what I bring into my file and how. (New file, or incorporated into my existing file). AND I should be given information on what data is dropped on the floor due to the difference between the sending software and the receiving software.

I think you may be hinting again about a Place Name 'data base' to make sure that the information is in a specific format. I am not sure we want to go there, but we might have to.

BUT, what does need to be understood, and it may be as simple as a flag, is the "data elements" in the Place name.

The particular Jurisdiction for a country needs to be understood by the software at either end. But Transport of that information may need to know if the Jurisdiction description is being transported. Like, City, Township, County, State, Country.

Then, were would the 'description' and/or GPS information be in the data stream.

The software I used, makes assumptions of the Jurisdiction description when I enter data. It drops County, Township descriptions or it gives me a warning about it. The good news is that I now I have control over the Country being displayed, or I can filter out certain countries.

ttwetmore 2010-11-19T14:37:17-08:00
"I guess my struggle with the "binding things" is where does that belong. In the application or the transport of the information."

"I what the application to allow me to choose what I want to share, and I want to Select what I want presents, based on the application."


I have a slightly unconventional view concerning what BG can be used for, which started 20 years when I was considering what Gedcom could be used for. My premise then was something like this. If Gedcom is the only way to move genealogical data from one place to another, we want the Gedcom to be able to be able to represent ALL our data (we know this isn't true but bear with me). Secondly, if the Gedcom format is able to hold all of our data, then why would one want to have a database format in a program that couldn't export its full contents to Gedcom or be able to import the full contents of a Gedcom file to its database. This didn't make sense 20 years ago, it doesn't make sense today, but every genealogical program (except a very few) fail to address this very obvious point and continue using databases that don't heterodyne with Gedcom and loose information on export and import.

I decided to experiment with a solution to this situation with the LifeLines program. There were two parts to that decision. 1) The database of LifeLines are records in Gedcom format. 2) The Gedcom records are not limited to the tags and substructures defined by the 5.5 specifications; users can invent their own tags and substructure. In the itsy, bitsy, teeny, tiny world of UNIX genealogical software, my conjectures proved the be true.

So quickly to another point. What should a genealogical database be? The typical answer is an RDMS. Using this model often is the source of 1) onerous restrictions on how much information can be stored in a database, and 2) the reasons why export and import operations to and from Gedcom tend to be lossy. My long held belief is that the proper database technology for genealogy is the hierarchical model with enough of the networking model to support linking between records. Gedcom records are prefect for implementing as a hierarchical database with networking. As is now being proven over and over with databases using XML to structure their records (at this level of thinking Gedcom and XML are identical).

So now BG. Now I'm NOT going around advocating that the BG model be used both for the transport model and the database model, but to my mind the advantages of this are so obvious that if I were a designer of genealogical software I'd sure be thinking that way. Consider that Gramps today uses the XML form of its data model for archiving its databases in external files. Clearly Gramps has decided that its internal database must hold exactly and only what its data model can specify. They are surely on the right track for doing that. Personally I think that is one of the things that gives Gramps its legitimacy.

Finally to your quotes. First I believe the binding, if I understand what you mean by that, belongs in both the application and the transport. I think they should both hold the same information. I can't see any real argument that they should be different. It might be that the internal representation of things might have more links for performance and those links wouldn't be needed in the transport form, but I can't imagine much else. In my DeadEnds models I do have some two way links (eg., Events link to Persons and Person link to Events in mirror-image fashion) and only one direction would truly be needed for the archival and transport state of the data.

As to your second point. I take to mean that when you want to export information from your database to a transport file you would like to be able to filter the information, both in terms of the set of persons to be transported, and in terms of the information about the persons to be transported. I agree whole-heartedly. I don't think there is any aspect of BG that would hinder this. I think the restricted output should still be in BG, but see a problem with that.

Tom Wetmore
greglamberson 2010-11-19T15:20:46-08:00
Russ and Tom,

I basically everything you've said, except maybe for the concept of binding information. As Russ implies, we've discussed this concept before. I do also agree with you, Tom, that the binding concept resides partially in application and in transport (i.e., BetterGEDCOM file) - AS LONG AS the data needed for binding can be included in the BetterGEDCOM file but not the RULES for binding. It may be that the receiver of the BetterGEDCOM file also wishes to use the same binding system and they may not. This should be optional. BetterGEDCOM should not advocate use of any one geographical system but should also allow the data to use such systems to be transported between systems.
GeneJ 2010-11-19T16:04:21-08:00
Ooo ... Ooo ...

Tom wrote:
0 @S444@ SOUR
1 TYPE birth certificate
... blah, blah, blah ...

tickle my belly ... gimme more of the blah blah blah!!!
ttwetmore 2010-11-19T16:59:38-08:00
Please take my comments about binding with a grain of salt, because I don't really know what is meant by that word in our context. If Russ or Greg would like to say what they mean I might understand where I am going astray. To me binding means how the individual data entities, Persons, Events, etc, connect with one another. This may be too limited a view.

When Greg says a receiver of a BG file might use a different binding, I take that to mean that the underlying model of the receiving system might not have the same entity types as BG so would have to do some mapping of information. A good example would be a system that doesn't have events as entity types, that would have to "bind" the BG events into the receiving model's Person entities. Hope this makes sense.

Tom W.
hrworth 2010-11-19T18:48:10-08:00

I hope my "binding" question was "What is Binding". I am not sure what that means, unless it is the Unbinding of the data being sent, and the Binding at the other end. That is Breaking the pieces apart the data elements and putting them back together. That is why I was suggesting the application break its bundle of information up into bit buckets that are send down the highway and the application at the other end knows what the bucket is and put it back together the way it wants to put it together for the end user.

Bucket1 = Person information
Bucket2 = Event information
Bucket3 = Source information

within the bucket there needs to be a pointer to show the relationship between the contents of the buckets along with the contents.

AdrianB38 2010-11-20T13:30:50-08:00
Tom - re evidence and conclusion version of entities.

I am convinced enough by your experiences to dispense with having 2 entity types for person (e.g. PERSON-STORY and PERSON-EVIDENCE) and just have 1 (e.g. PERSON) with an attribute that flags whether this is an evidence or conclusion version.

The question then arises which of my other entities should have this flag, i.e. which others operate in evidence or conclusion mode.

If we are recording the history of GROUPs and LOCATIONs, it seems clear to me that these must also operate in evidence or conclusion mode as the information against the group or location comes from sources and need to be evidence-managed. Hence they should have the flag.

Thoughts on that?

I think that a CHARACTERISTIC or an EVENT also operate in evidence or conclusion mode depending on whether they are extracted from the source or represent deduced conclusions. However, I also think that they do not need the flag since they take their mode from the things that they describe. (And further, that means a conclusion EVENT can only relate to conclusion entities and an evidence EVENT can only relate to evidence entities).

Thoughts on this and the lack of a need for flags?
greglamberson 2010-11-20T13:58:13-08:00
In the context of place information and geographic naming systems (GNS), binding is the relational information between these two. This is the only context in which I have been thinking of binding in this discussion.

Using the location Mt. Vernon, IL, here is what I man.

Place Info
City="Mount Vernon"
Country="United States Of America"
GeoReference-system="Google Maps"

In the above example, I have manually entered place information and one external geographic referencing system with its needed data, expressed in a URI, within that system. I am not using any particular rules of any geographic referencing system to format my manually entered place data, but I also provide the data needed to use the Google Maps geographic referencing system.

So in this case, I provide binding information (i.e., reference to a particular geographic naming system and data needed to reference a particular asset within it) but am not using binding rules (the manually entered place information remains separate and independent from the GNS info).
ttwetmore 2010-11-20T14:59:21-08:00

"Tom - re evidence and conclusion version of entities"

It seems to me that persons and events must come in both versions since we build them both up from evidence. Not to confuse things more than they already are, but I view the evidence/conclusion spectrum as a continuum, not an either/or. The implementation I prefer is one in which a person object can refer to a list of other persons. The persons in the list are the evidence for the person with the list. Importantly the higher level person has its own components, derived from the researcher taking what is felt best from the persons in the list. The source of the higher level person is then the researcher and his/her justification.

Things can get more interesting because there is nothing to keep the researcher from putting this new conclusion person in a list with other persons and creating an even higher level conclusion person to hold that list. We can reach a situation where person records exist in tree structures. Each person in the tree represents a conclusion about all the persons in its own list, and at the bottom are the persons that come direct from evidence. Don't know whether the user would ever want to know about this, or course, unless the user were a professional genealogist who thinks this way anyway.

Well then, to tags. When I was writing my DeadEnds model, first version now 10 years old, I considered putting in a flag. But I decided I didn't need to, since a person record is either going to be in the list of another person's persons or not. If they are in a list they are part of a conclusion person. If they are not they are a conclusion person. So the flag would only indicate whether a person was in another person's list or not, and that check is just as good as checking for an explicit flag. Ditto for events. This has some great implications for a user interface. When the user wants to see the current state of the database, the UI could only show persons that aren't members of lists. But when the user wants to go into research mode and see all the existing data, the UI could have a way of showing the contents of these person trees.

When talking about families, groups, characteristics, we're in an area I haven't thought about in depth. My immediate reaction is that families are a kind of group, and a group is usually a kind of conclusion object one creates from lots of different kinds of evidence, so that families and groups don't need flags because there's no need for the distinction. Must say I'm not sure about that at all and your thoughts might be better developed.

I agree with your point about characteristics, as I feel, as you pointed out, that they are best thought of as being bound into the higher level object they are providing information about.

(Aside. I've mentioned the Zoom Information combination algorithms that built up person records into a final set of individuals [100s of millions combined into 10s of thousands, a fairly hefty boil down]. The data structures used in the algorithms were person records that could be structured into trees, exactly as just described. Each "phase" of the algorithms built up another level in the trees as each phase implemented a different scheme for determining whether two "persons" [either single records or trees of already joined records] are likely to be the same. So there is some empirical proof of this concept in the real world.)

Let me give you an example of this multi-level stuff in action.

I had a great(2) and great(3) grandfather, both named Daniel Wetmore, and both who lived (among many other places) in Norwich and New London, Connecticut. There was a third Daniel Wetmore, son of my great(2) grandfather, who also lived in those two towns at the same time. All three men are found in different combinations in city directories of New London and Norwich over a time span from 1868 to past 1900. When I started researching these men I didn't know who they really were, since all three immigrated from Nova Scotia at different times, that I didn't know, so I had to slowly figure out that there were really three Daniels, and how I was related to them. I made every possible wrong assumption along the way. In fact I started with the assumption that my great(3) grandfather had never come down out of Canada at all, and had died before 1868. It turned out after much work, that the great(3) grandfather, already aged, was the FIRST of the Daniel Wetmore's to come to Connecticut. (I had assumed that all the early records were for my great(2) grandfather) The evidence I started with was maybe 100 evidence events taken from the city directories, combined with data from the 1860, 1870, 1880, 1900, 1910, censuses, some serendipity discoveries in ship arrival and immigration, and naturalization records. I spent hours doing by hand this "combining" of evidence into higher level persons. As you can imagine I had to resort the evidence many times, like putting pieces into a puzzle, before I finally found the right configuration and everything became crystal clear. It was that experience that has made me think about whether there would be any kind of computer support that would have helped me to organize all that data and support my fumbling around with groupings and conclusions. I think it was this experience that has since driven me towards better models and better software support for genealogy.

Tom Wetmore
AdrianB38 2010-11-21T05:27:39-08:00
Tom - thanks for that. I am attracted to the idea that we don't need the flag since the conclusion person is just the end of the chain. That certainly avoids the need to double enter someone like a child who appears (so far) just in one census.

I reckon that a group could appear in both evidence and conclusion versions if one were researching a regiment (say) and had various sources about that regiment.

If we have a soldier who appears just once and once only (in a regimental muster, say), then he would appear once in the "database", with no-one chained from him. But he would then have two relationships - one to the evidence regiment and one to the conclusion regiment. Hmmm. I think my brain is hurting at this point.

I think I shall just annotate my model to say "flag to be confirmed" and think it through later when some evidence management experts tackle it.
greglamberson 2010-11-18T15:21:07-08:00

This is great, buy go ahead and put it on a regular page where it can be edited, refined, dissected, etc. The main pages aren't for conclusions only. We've got to move more of the data, whether theory, opinion, or whatever, to the main pages.

I've got some other work to do right now, but I hope to get a ton of reorganization work and assimilation of information in the discussions onto the main pages. I hope you also will have time to put this on a few of the main pages, starting with the curent one and also making your own subpage at: http://bettergedcom.wikispaces.com/Formulation+Of+The+BetterGEDCOM+Data+Model .
ttwetmore 2010-11-18T17:07:02-08:00
Adrian's PERSON-STORY is what I have been calling a conclusion person. Adrian's EVENT is what I have been calling a conclusion event. Adrian's GROUP-STORY is what I call a Group. A CHARACTERISTIC is an attribute with a date sub-attribute attached to it, a useful concept. See my DeadEnds model doc where the same four entity types are discussed under the names Person, Event, Group, and Attribute. LOCATION-STORY is a new idea as far as I can tell, a useful one for showing the evolving of names of geographical entities, similar to the Place entity in other models but with an interesting historical twist.

I don's wish to sound pedantic, but Adrian's model is a good, standard model, with similar concepts to others, but his unusual names may hide those similarities. This is a model consisting of conclusion Persons, Events, Attributes, and Groups, which is an excellent partial set in my opinion. I think it is important to see right up front that Adrian's model fits in with others we have before us so we don't spent too much time arguing differences.

Adrian's model does not cover evidence, though I presume he may add them in part 2. All his entities grow and evolve, none are static and retain exactly the information extracted from evidence. I believe a data model for genealogy must model the evidence persons and evidence events, and that the conclusion persons and conclusions events are built up from them.

In the DeadEnds model the Person and Event entities can be used in "evidence mode" and in "conclusion mode". See the DeadEnds doc for how that is done. I've gone back and forth on whether a data model for genealogy needs separate classes/entity types for conclusion persons and evidence persons (and evidence events and conclusion events). My current thinking is that we do not need separate classes. Again, see the DeadEnds doc for how the two can be distinguished.

It might also be useful to read the documented I uploaded about Combination. This covers a real world application (see zoominfo.com to see it in action) in which hundreds of millions of records representing evidence persons are algorithmically combined into ten of thousands of conclusion persons. In that application the same data class is used for the two concepts (they are called persons when they are evidence, and they are called individuals when they are conclusions). The application is not genealogical, where relationships between people are important, rather the application is occupation oriented, where what the persons do and who they work for are the important relationships. However, there is much to be learned about how sophisticated combination algorithms can be applied in a genealogical setting.

Tom Wetmore
ttwetmore 2010-11-18T17:21:20-08:00
I sense some confusion and tentativeness over the concept of an Event. Not necessarily here in this thread, but in a number of places in the discussions. Since I think the concept of an Event is fundamental to any genealogical model, and since I think it is a very simple, and common sensical concept, let me define exactly what it is in my opinion. It consists of:

1. A Type -- e.g., birth, death, census enumeration, land transaction, signing of a will, arrival on a ship, filing of naturalization papers, basically anything that happens that involves one or more persons and hast has genealogical signification (whatever that means).
2. A Date or Date range -- an Event happens -- this records when that was.
3. A Place and set of Places -- an Event happens somewhere -- this records where (yeah some people who want to blow holes in every assertion, can come up with events that seem to stretch the Date and Place points, but 99.999% of the cases that matter clearly have Dates and Places.
4. A set of Roles being played by persons -- Events involve people, usually more than one person. Even the birth event, which nearly every genealogical software system in the world will try to convince you involves only one person, involves at least three! By Gedcom forcing you to put the birth event into the record of the person who was born, hides the very important fact that to two other people a very significant event happened on the very same day!
5. A set of attributes that add any other interesting tidbits about the event not covered by the first 4. The first 4. are what make an event an event -- an event is something that happens to one or more persons, at a time and at a place -- they are genealogically significant because they may create or change relationship between persons or they may put a person into a new state of being. The tidbits are just that, tidbits.

That this definition does not preclude the Event from being either evidence or conclusion.

Tom Wetmore
hrworth 2010-11-19T05:58:55-08:00

I have a question about Places.

From what I have seen, there may be another attribute, to the Place as you described.

To me, there is a Place that is defined and you might event get down to a GPS point on a Map.

The other attribute might be What is at that GPS location that might be important.

Two examples: A Church or a Cemetery, by name.

By acknowledge that type of information as part of the Place piece of the Event, when this information is shared between to End Users, the software that the End User is using could then present the end user with a view of everyone who is buried in that Cemetery.

I agree with the rest that you posted.

Thank you,

ttwetmore 2010-11-19T08:40:44-08:00

You make a good point. I have always been a little miffed that GEDCOM doesn't have a tag for cemetery, hospital, court and so on. When I create Event records in LifeLines, which uses Gedcom as its syntax, there is no major problem, because I can invent tags and tag structures, and as long as I am consistent in my conventions the report programs I wrote for LifeLines can deal with the new types.

For example,

2 DATE a date
2 PLAC a place
2 CEME name of cemetery
3 ADDR ... substructure with address of cemetery if desired ...

is how I often add cemetery info in LifeLines.

Herer the cemetery is a property of the event, not of the place. This could also be done in this super Gedcom as

2 DATE a date
2 PLAC a place
3 CEME the name of the cemetery

And here the cemetery is a property of the place instead. I don't know which is the better approach.

Here is something interesting about how place references can be used, however, something I learned from Gramps...

If the Place record is an independent record that other records refer to (as in Gramps), the the REFERENCE TO THE PLACE FROM OTHER RECORDS can have attributes. This is a neat idea as it gives properties of the Place from the point of view of the other records (primarily Event records). This keeps the Place record itself clean of the extra info, placing the extra where it is appropriate. Here is an example, once again using a super-Gedcom, and also promoting my event evidence and event person view (there is one of each here).

0 @E101@ EVEN
2 DATE 18 December 1949
2 PLAC @P202@
3 HOSP Lawrence and Memorial Hospital
1 INDI @I333@
2 ROLE child
... roles for mother and father ...
1 SOUR @S444@

0 @P202@ PLAC
1 NAME New London, New London, Connecticut, USA

0 @I333@ INDI
1 NAME Thomas Trask /Wetmore/ IV
1 EREF @E101@
2 ROLE child
1 SOUR @S444@

0 @S444@ SOUR
1 TYPE birth certificate
... blah, blah, blah ...

This provides a good example of the "attributed reference" technique that Gramps and Family Tree Maker uses. Note how the Place record only records the place itself (there could be other attributes for more things, but I've just put in the name here). And note how the place reference (using the PLAC tag) in the Event record points to the Place record for New London, Connecticut, but the HOSP attribute is attached to that reference.

This doesn't really get to your issue of being able to share that cemetery with all the people who are buried there. To do that your notion that the Place record would have to be enhanced to hold not only a geographic location, but also more details about something at that Place. Should we enhance the Place record so it can also be an institution, or is that really another kind of entity that points to the Place? That is probably the better idea, but it requires an addition to the model, either by making place more than geography or by adding a new entity class. Personally I would opt for a new entity type, as long as we had a way to keep needing new entity classes from getting out of hand.

Interesting ideas you have brought up.

Tom Wetmore
hrworth 2010-11-19T09:19:02-08:00

Ah, the issue and hand. GEDCOM vs BetterGEDCOM

I would guess, that somewhere in the GEDCOM 5.5 "standards" there is a CEME or HOSP tag. Which is a good thing.

But, how many applications create or read those two tags (as examples)

Until 2 years ago, the software I use only allows for the Event Name, Date, and place. The developers were sneaky here, they has actually allowed the user to have two pieces of information in one field on my screen.

When the re-write of the program was done, the developers broke that one field into two. This was a step forward.

I don't get too hung up on the label of the field that I am looking at on my screen, but how will the information show up in reports and charts.

The new field names for an event was now Date, Place, Description.

The Place was very clear as to its purpose, but how to capture the Cemetery, Church, Hospital information that I knew or found. The Description served that purpose. Its also helpful for other Events.

So, this third data element, lets me get a little more detailed, and more or less free hand information, but entered properly, I can generate are report, for example, of everyone buried in a specific Cemetery.

For another discussion, an Event / Fact has to do with an Address for an Event.

A Census Record comes to mind. The date is easy (census year), the Place is easy, (maybe), Place being a Jurisdiction, but there may or may not be more detailed information that may not be included in a GEDCOM but should be available in a BetterGEDCOM.

Enumeration District is one important item, but what about a House number or street address that might be in that Census Record.

Just to expand what "a place" is.

The BetterGEDCOM, in my opion, should be able to pass this information, if included, from one application to anther, and present the information at the 'other end'. What either end does, it not material, its just that there may be other elements in a Place.

ttwetmore 2010-11-19T09:58:41-08:00

More good points. Having a description catch all field can go a long way. Other programs allow one to attach notes with similar usefulness.

I've been working on updating the DeadEnds data model recently and thought it might be interesting to show you what the current definition of the contents of a DeadEnds Event record is:

eventContent :: eventType date? placeRef* personRoleRef* eventAttr* eventRef* urlRef* note* noteRef* sourceRef*

Don't be put off by the odd notation. All this means is that an Event object's contents consist of: an event type (a tag indicating the type of event); an optional date substructure; references to Place entities; references to any number of Person records who play roles in the Event; any number of attributes specific to the Event; any number of references to URLs (eg., local files, web pages) that have information about the Event, any number of note substructures within the Event; references to any number of separate Note records; and references to Source records this Event record is based up on. The ? means optional and the * means any number including zero.

The issue we are talking about, that is, binding things like cemeteries, hospitals, churches, courts, etc, to Events is intended to be done through the event attributes. Each attribute is a tag value pair, where the tag is a string and the value is either another string or a deeper substructure of information. The extra info could also be put in the note substructures or in separate note Records (useful if other records want to share the same note information).

In the DeadEnds model, the event attributes provide a long list of tags that are useful for defining Event attributes, but the user is also allowed to create their own attribute tags as long as they don't conflict with exiting tags.

I should also point out that all the reference elements in this definition are "attributed references" as mentioned in an earlier discussion. For example the definition of placeRef in the DeadEnds model is:

placeRef :: "placeref" : [ id placeRefAttr* ]

This means that a place reference in an Event record consists of an element (or node is another good term) with the tag "placeref" that has a substructure that consists of an id (the UUID that is the index of the Place record), and an optional set of attributes that are appropriate in the context of place references. So, for example, a Place reference in an archival string format might be something like:

placeref: [ id: abcbd-aedef-abece-4d7f ; type: hospital; name: Lawrence & Memorial Hospital; enumerationDistrict: ward 4]

Tom Wetmore
hrworth 2010-11-19T10:33:34-08:00

I guess my struggle with the "binding things" is where does that belong. In the application or the transport of the information.

Quite simply, the BetterGEDCOM is the transport of information provided. The application generates what the application 'has been told to send along' and the application at the other end accepts that information, and presents what ever it wants to present in the way the application so chooses.

I what the application to allow me to choose what I want to share, and I want to Select what I want presents, based on the application.

What I am hope that you are presenting is what might be in that transported information and make place holders for the pieces of information.

GeneJ 2010-11-20T08:27:16-08:00
What information do we need/have about current genealogy programs in use?
Do we need to begin to define the fields that exist in current versions of genealogy software programs?

If that is the case, what is the best way to begin that process? What resources will we need to accomplish such a task in good fashion and how would we organize those resources?

Are there IP considerations (copyright, trademark) or licensing issues we should address before we begin that process?
greglamberson 2010-11-20T11:19:50-08:00
Please review my rough sketch on use of templates which helps address this very issue. Sorry I didn't see this earlier, but I've been feeling a little under the weather and have restricted my activities.

The page is here:

greglamberson 2010-11-20T12:21:11-08:00
To amend my comments, it will be largely up to software vendors to deal with such mapping concepts in practice. We certainly need to be cognizant of what software products are doing as a practical matter. However:

1. I don't think we're even close to the point of worrying about this in detail; and
2. Yes, I do have concerns about how much detail we get into regarding actual software products (at some point, anyway).

Regarding how templates address this issue: Templates allow flexibility in how data fields in BetterGEDCOM are mapped to fields in software programs. While these templates will be restricted to certain areas (e.g., names), these areas with a high degree of specificity are the biggest problem when it comes to mapping data to different systems.
greglamberson 2010-11-20T11:17:46-08:00
Use Of Templates

I have added a page subordinate to the Data Elements page called Use Of Templates. I've illustrated (roughly) the idea of templates as it could be applied to person-names, but this concept is similar to what can be done with dates, places, and other kinds of data.

Please have a look and comment, as I think this is a critical piece of our work.

The page is here:


greglamberson 2010-11-20T15:02:46-08:00
Source and Citation

I see you have edited the information regarding source and citation. This is great, except that a SOURCE entity is absolutely NOT the same thing as a CITATION entity. These two must not be intermixed within data. I suggest you make a separate CITATION entity and page.

Please keep in mind that these two items as commonly understood in terms of GEDCOM and genealogy databases are vastly different than the equivalent terms in discussing genealogy methodology.
GeneJ 2010-11-20T18:24:36-08:00
Some terminology:

Genealogical Citation/"reference note" (aka footnotes or endnotes):
From _Evidence Explained_ 1st ed, electronic, p. 828, "reference note: a citation ... or comment placed at the bottom of a page or at the end of a piece of writing and keyed to a particular statement in the text; its purpose is to identify and/or discuss the source of the specific statement made in the text."

A source:
From the same work, p. 828, "source: an artifact, book, document, film, person, recording, website, etc., from which information is obtained. Sources are broadly classified as either an original source (q.v.) or a derivative source (q.v.), depending upon their physical form."
greglamberson 2010-11-20T18:27:25-08:00

Agreed. on basically everything except the need to repeat a citation. I just don't know about this, but right now I think of a citation as a unique entry. I'm not sure what you mean yet, and I'm not sure whether I agree with that or not.
GeneJ 2010-11-20T18:57:12-08:00
Hi Louis Kessler:

You wrote, "...implement sources and citations so that all existing GEDCOMs would still be valid."

Is there a URL or quick reference guide to the GEDCOM source fields or tags (sorry, non-techie here)?

You wrote, "worst thing about GEDCOM right now is that one citation needs to be repeated exactly in all the events it is referred to. This must be fixed .."

How would we fix the repetition. Since we are pointing source citations to specific data/information "bits."

PS, I note at http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gcch2.htm#Example , existing GEDCOM uses the term, "Source-Citation" and "Source-Record"
louiskessler 2010-11-20T19:55:55-08:00


Yes. In GEDCOM, SOURCE_RECORD is what we are calling "Source" and SOURCE_CITATION is what we are calling "Citation".

In GEDCOM, the SOURCE_CITATION is not a record. It has to be inserted in full under the tag it pertains to in every instance it is referenced.

This can be fixed by making the Citation an entity, and then simply referring to the entity identifier when it is needed. The detail of the Citation will then only be included once under its Citation entity.
louiskessler 2010-11-20T20:40:48-08:00


I have lots of examples of citations in GEDCOMs that are used multiple times. Most are used for different events for the same person, e.g. a death certificate gives multiple bits of information: birth, marriage, death, dates and places, notes.

But some are even worse. An example might be a ship's record. The source is the Microfilm reel. The citation is the Microfilm roll number and location on the reel. Maybe the page and even the line number(s).

For that citation, it will indicate the names and ages and other information for the whole family. You may connect that citation to the birth and immigration events and to text with in notes attached to several different people. It can confirm the marriage. It might identify the country of origin.
greglamberson 2010-11-20T20:47:39-08:00

While you might functionally consider those to be the same citations, in fact they structurally should be separate. I think.

Structurally there needs to be opportunity to made notes for each time you cite a particular source, even if that's not how you use them.

At least, that's what I think I think.
louiskessler 2010-11-20T21:14:49-08:00


You can still make notes each for each citation. The note will be included inline.

e.g. in Pseudo-GEDCOM:

2 DATE 1845
3 CITE @C123@
4 NOTE Calculated from age 6 in 1851.
2 PLAC Paris, France
3 CITE @C123@
4 NOTE Assumed Paris since that's where they arrived from.
GeneJ 2010-11-20T21:16:23-08:00
In 2009, Mark Tucker (www.thinkgenealogy.com) did some great work with FTM 2009, Legacy 7, and RootsMagic 4, both comparing the approach each took to implement the "Book: Basic Format" from the QuickCheck model in _Evidence Explained_.

In May 2009 (http://www.thinkgenealogy.com/2009/05/03/better-online-citations-details-part-2-gedcom/ ) he looked examined the data/information loss resulting from files containing that single "templated" citation was "conveyed" (his term) via GEDCOM.

In his initial conclusion, Mark wrote, "we discovered ... the ... three applications that support EE-style templates do so slightly differently on the input side (part 1) and vary greatly when it comes to GEDCOM output. As it stands today much is lost in the GEDCOM export rendering rich citations into blobs of text."

Mark's study was limited to one QuickCheck model/template, when there are many. When we consider user customization and alternative citation style-guides in use (often in the same user file) the number of different genealogy source and citation models in use seems almost endless.

...I think I need an Asprin.
louiskessler 2010-11-20T21:33:54-08:00

People don't realize how difficult a task creating GEDCOM was in the first place. The LDS actually did quite a commendable job. Their success could be measured by the fact that 99% of genealogy programs are GEDCOM compatible in at least a basic way.

I think the goals of BetterGEDCOM shouldn't necessarily be a total rewrite. Maybe just: (1) convert to XML and UTF8, (2) Only change what needs to be changed, and (3) keep it absolutely as simple as possible, because the more complicated constructs will not be used.

If its kept simple, it may have a chance of being adopted.
louiskessler 2010-11-20T21:48:20-08:00

... further on that:

What's wrong with starting with LDS's GEDCOM XML Release 6.0 Draft and then modifying it only where necessary?

It would take BetterGEDCOM a long time to get close to the level that the draft reached.
greglamberson 2010-11-20T21:55:36-08:00

You're going to make me crazy they way you keep referring to citations when for our purposes, all that information is source information. Maybe we need to change our language for the sake of clarity... For purposes of BetterGEDCOM's work, maybe we could refer to the link of a source to a fact (or whatever) as a "Source Referral," or just a referral for short? I don't know, but we've got to do something. I don't think we can keep expecting everyone to use the same terms for different things in different circumstances (as is now the case). Ideas?

Regarding the "citation models," as you call them (which, for our purposes, refer to SOURCE information), this is one place where my idea of using templates would be ideal. New templates could be defined independent of applications and used accordingly. Of course, how software developers make such a system work is an entirely different matter. As you have pointed out, the problem in TMG, for example, is hard-coded and so is not easily resolved. Anyway, we still need to define categories of source ("citation" for you) elements. I don't know what I'm talking about now, though, so I hope you know what I mean so you can start talking intelligently. I need to read my ESM EE.


I think you're looking for the GOALS page and discussion. ;-)
hrworth 2010-11-24T21:33:21-08:00
greglamberson 2010-11-20T15:20:41-08:00
Also, in regard to this phrase: "BetterGEDCOM exports ... will need to represent citations for each bit of data as ... though the source is the identified BetterGEDCOM and the 'source of the source' is that which was described in the maker's source file."

This is a very important concept, but it cannot be applied universally to BetterGEDCOM as a file format for a few reasons. First, citing a source is a function of an application. Also, perhaps the researcher is merely moving data between applications. We should strongly advocate for this functionality within an application, but requiring this of an export process by a given application is neither possible nor advisable universally.
hrworth 2010-11-20T17:02:45-08:00

I agree that a Source and a Citation are two separate pieces of information.

Would it be true, for the BetterGEDCOM, that a Citation should NOT be allowed with a Source?

Is it also true, that a BetterGEDCOM might have a Source, without a Citation?

I am guessing that One Source, in a give file being shared, would have many Citations.

As the data is being assembled for transport, a Source, with some sort of Identifier would be put into the bucket to be sent along.

Any Citation from that Source, would be put into the data stream at different points in that file, associated with the Person or Event that is documented by that Citation.

These Citations should then have a reference back to the Source.

Is that how this would work?

I propose, for this discussion, that a Source might enter the stream without an Associated Citation. This might happen based on the software being used, that is not requiring a citation.

Should the BetterGEDCOM have any rules about this, if I am correct?

I propose that there should not be a rule of Not allowing a Source, without a Citation.

But a Citation without a Source would have such rules.

Does this make sense?

greglamberson 2010-11-20T17:21:37-08:00
Russ said:

"..a Citation should NOT be allowed with a Source?"
I think so, yes.

"Is it also true, that a BetterGEDCOM might have a Source, without a Citation?"

"...One Source, in a give[n] file being shared, would have many Citations."

"...Is that how this would work?"
Yes, precisely.

"Does this make sense?"
Yes. The definition of the data model can and will include rules for such things as dependencies, and a citation should have as one dependency a source reference.

In databases, a citation is a very simple thing with some notes but otherwise just associating a source and a fact (or whatever you prefer to term events/characteristics, etc.). This is vastly different from a citation in genealogical methodology, where a citation includes a great deal of information a database includes within its source entity instead.
hrworth 2010-11-20T17:34:09-08:00

I must have miss typed something:

"..a Citation should NOT be allowed with a Source?"

That, should be "a Citation would not be allowed without a Source".

I wasn't addressing the Citation Notes, but now that you reminded me of it we should add it do the discussion.

So, we have a Source. It's Identified. It may or may not have any associated citations. But, it Might have a Source Note.

A Citation, on the other hand, should NOT be allowed without a reference to a Source.

It too, may or may not have a Citation Note.

I think that with Source Notes and Citation Notes, we are gathering information on how we want to Evaluate that Source in general, and Citation in general.

In other words, to evaluate our Evidence.

It may or may not end up in a Conclusion at that point in time.

I had a book (source) that has lots of information in it. But the more I worked with other evidence, that data contained in that book was less reliable. That is where the Source Notes would come in handy.

Since I stopped looking at that book, I do want a reminder, in case I picked it up again, to remember why I had stopped looking at that book.

Now, if I were to Share my research, and I chose to include source notes, I might want to include the source notes. When the person receiving that file, and looked at my 'conclusion' if I had gotten that far, might wonder why I had ignored that Source. The Source notes would supply my reasoning for not including that source in my file.

greglamberson 2010-11-20T18:02:23-08:00
Well, I read it like you meant it, Russ.
louiskessler 2010-11-20T18:05:32-08:00
We need to implement sources and citations so that all existing GEDCOMs would still be valid. Therefore, Citations must have sources, but Sources do not need citations.


2 SOUR @S1@
1 ...

is a Source without a citation, and

2 SOUR @S1@
3 tag ...
1 ...

is a source with a citation, the citation being everything level 3 and lower after the SOUR tag.

And BetterGEDCOM will need a both Source and Citation Entities.

The worst thing about GEDCOM right now is that one citation needs to be repeated exactly in all the events it is referred to. This must be fixed.
greglamberson 2010-11-23T01:14:10-08:00
Please note: Data moved
This page was getting pretty unwieldy, and so all data on the individual entities has been moved to appropriate sub-pages. Please move discussions specific to individual entities to the appropriate individual pages.
louiskessler 2010-11-27T10:51:26-08:00
Eliminate Facts
I think the concepts of "properties", "facts" and "characteristics" should be eliminated from BetterGEDCOM, and Events or maybe "Event-States" should be used instead.

My changes to the Data Elements changes describing my recommendation are:

Events, Properties, Characteristics and Facts

It is misleading to believe that there is a real difference between them. The differentiator is usually that properties, characteristics and facts do not have a date assigned but events do. In reality properties, characteristics and facts do have dates associated with them, but they are often unstated. e.g. Occupation definitely has a date range. Eye Color is usually for the life of the person. Hair Color: This can change multiple times in a person's life. It would be best to classify these all together as a single entity. This would eliminate confusion as to where to place these items.

The only downside of the "Event" term is that is is thought of as something that happens. e.g Someone's hair turned gray, whereas a property, characteristic and fact are thought of as an existing state, e.g. Someone's hair is gray. So a single unambiguous term should be agreed on as the Entity that will refer to events, properties, characteristics and facts. Maybe something like: "Event-State"?

(a) Event entity
Events are usually thought of as an association between:
1) one or more people
2) at a date/time or over a period of time
3) at a given location
An event without a date/time or period of time is usually thought of as a property, characteristic or fact. An event without people would be associated with the location (e.g. House burns down). An event without people and a date/time could be thought of as a property, characteristic or fact of a location.

(b) Property, Characteristic or Fact entity
A property, characteristic or fact describes something always true of someone or something. However, there is no difference between a characteristic and an event without an associated date/time.
hrworth 2010-11-27T11:06:47-08:00

I think you may be suggesting some "rules" that may not be correct. Like, and Event with out a Date. That DATE may not be known at the point in time when the data was entered into my database.

Also, you have not left room for any Evidence that the Event took place, or any notes that a user may have entered about that event.

louiskessler 2010-11-27T11:11:17-08:00

Of course any item may have data missing because it is not known. All data items would be optional.

I'm actually just trying to get the concept here that an Event covers it all. Details of what can be assigned to Events, such as evidence, is another matter and I'm completely open on that.
greglamberson 2010-11-27T11:15:28-08:00

To my knowledge, we haven't really gotten into this area of discussion here. I don't think we have set definitions for these terms, and I am not aware of what others mean by them. The idea that characteristics and facts don't change (i.e., they have no date element) makes little sense to me. However, I do think defining things precisely has value, and as such, there is logical difference between something that happened (i.e., an event) and something that describes someone (i.e., a characteristic).

Yes, I think I totally reject the idea that a characteristic doesn't have a time element. Isn't an occupation a characteristic?
ttwetmore 2010-11-27T11:18:55-08:00
I just started a new topic before seeing this thread.

I just want to go on the record for being in complete opposition to the changes and ideas being expressed here. You can see what I wrote on the other thread to get my main point.

Once you boil everything away there are only two primary concepts in any data model -- the objects that are being modeled, and the attributes that they have that can distinguish one instance of an object from another. In the genealogical data model the event is one of the most important of the data objects. Trying to convert an event into an attribute would be disastrous for any genealogical model.

Tom Wetmore
ttwetmore 2010-11-27T11:28:43-08:00
Quote: "An event without a date/time or period of time is usually thought of as a property, characteristic or fact."

Huh? By whom? An event without a date or time period is not an event so this statement makes no sense.

You are 1) taking a statement that makes no sensel; and 2) combining it with what you think some imaginary person might think when they encounter that meaningless statement; in order to 3) justify a change in terminology and definition of concepts that goes counter to some of the most important concepts in genealogical modeling.

I wish to remain totally civil in these discussions so I don't want to offend. But you can't go and change wording with an entirely new and radical twist without some discussion first.

Tom Wetmore
louiskessler 2010-11-27T12:09:11-08:00


Please see my comment on your other post. I'd be happy if you updated the wiki pages.

AdrianB38 2010-11-27T12:56:42-08:00
Louis, For a long time I have believed that the Attribute (a.k.a. Characteristic) and Event in GEDCOM had no material difference.

However, we seem to have a consensus that the Event in BetterGEDCOM should apply to one or more people. That introduces a serious difference between Characteristic and Event.

Let me explain:
1) The two concepts of Characteristic and Event do have much in common. I could suggest that there is a super-entity called "Fact" (say) from which Characteristic and Event inherit many things.

2) In English, the two concepts of Characteristic and Event are subtly different. That difference is tricky to explain because the English language is difficult to pin down. One might automatically start thinking of events taking place at a given time while characteristics are true over a long period, and may even be timeless, but it is also clear that we can come up with counter-examples of events taking place over a long time (e.g. 1914-18). So English isn't much help - it suggests there is a difference but doesn't confirm it.

3) Let us assume now that we all accept that Events in BG can apply to one or more people. Then we can highlight the following differences between Characteristic and Event:
(a) A Characteristic applies to one Person and one Person only. Also, it must apply to someone (or something or somewhere...)
(b) A Characteristic _must_ have a value - it makes no sense to say "Her name was " and leave the value out. Nor does a Physical Description with a blank description make any sense.
(c) An Event _may_ apply to one or more Persons (or Places or Groups or...). Note that it might _not_ apply to anyone in the datafile (e.g. I might enter details about the San Francisco Earthquake as an Event a long time before I link my GG-uncle to it. During that time there is no relationship between that Event and a Person).
(d) An Event does _not_ need to have a value. It might be useful to have a short description (e.g. "San Francisco Earthquake" - indeed, it wouldn't make much sense without it) but in many cases there is absolutely no need for a description (e.g. a birth generally contains all the text to define it in other items and a further description could just generate spurious repeats).

In summary, we have complete differences in the cardinality and optionality of the relationship between Characteristic or Event (at one end of the relationship) and Person (at the other end), plus complete differences in the optionality of the "value".

If we only consider the super-entity of Fact, then we must have the least restrictive details for Fact, i.e. Fact has an optional value and one Fact may have a relation to one or more Persons (or Groups or...)

Any Data Model like that is therefore saying that:
(a) A Person's name might not apply to anyone.
(b) The Person's Social Security Number might apply to multiple Persons at once.
(c) The statement that a Person's "Name is " is a complete statement that needs no further investigation.

Frankly, such a Data Model doesn't seem to help anyone.

Worse still, when we come to the XML for the GEDCOM, because Event has to be a top level tag (to allow for Events with no associated people), then if we only have Facts, then the sub-type of Characteristic (which we are not, in this view of the world, splitting out) must also be top level tags. So someone's name(s) rather than being embedded inside the XML for the Person, must be split out like any Fact. In other words (and apologies for the crudity and over-simplicity of this XML), instead of
<Person Id=I001>
<Characteristic Type=Name>John Smith</Characteristic >

we would need something vaguely like
<Person Id=I001>

<Fact Id=F001 Type=Name> John Smith
<Person> PersonPointer=I001 </Person>
Not sure if that makes sense in XML terms.

Now, while we have to have this construction for Events, it seems perverse to split things like names so far off from the people to which they apply.

In summary, to work only at the level of the Fact super-entity that embraces Characteristics and Events, ignoring those concepts, results in XML that is bigger than necessary plus perversely difficult to read alongside a Data Model that omits crucial validation information about some of the Facts concerned.
AdrianB38 2010-11-27T13:09:30-08:00
Louis - picking up on something you said on another thread - I'm in total agreement with the idea that a Characteristic can have an optional date or date range associated with it, (e.g. "Military Rank = Corporal from December 1917 to October 1918" followed by "Military Rank = Sergeant from October 1918 onwards"). They could also have a Place (e.g. "Occupation = Shoemaker in Nantwich from 1851 onwards") and just about any other attribute that Events have.
gthorud 2010-12-04T13:31:07-08:00

Thanks to AdrianB for a thorough analysis.

I think it might be useful to consider some implications that the choices may have in practical terms - i.e. the implications for programs.

I am making the assumption that an event would be extended to contain a value ((at least one)) and possibly a short description that would contain the name of user defined attributes (the current meaning of attributes) when the name can not be carried by standard TAGS. And an event could refer to several locations.

If we choose to have only events it will allow a program to display all events in a list of events. This may simplify the user interface. You can utilize the functionality that most programs have for generation of sentences, also for typical attribute information. You only need one set of functions to handle all types of facts (using Brian's definition).

But showing all facts in an event list may also use more screen space than necessary. Simple attributes with a value do not need screen space set aside for person, date and places. Attributes with a binary value (a flag) could be displayed by a small icon, so you could easily display the value of 10 such attributes in the space set aside for one event.

A program could look at the received event and could store the info internally as a classic attribute or flag depending on the event type or depending on the amount of data. One problem would be if the data content of an event is not standardized, so one day you will receive a simple value for an event type (an attribute) and the next day it also contains references to persons and places. If you then have separate areas on the screen where the info is displayed, or different ways to display the info (eg. icon) or different functionality for processing the info - depending on the amount of info in the event type - you are in trouble. Or, still assuming the max types of info per event type is not defined, there would be differences in the types of data contained in event that programs would allow.

So, the difference between attributes and events, is a way to standardize the amount of info that may be associated with a fact. Thus, if the definition of an event type specifies the info that may be transferred (i.e. may or may it not have a date, references to persons, to locations, to groups, notes, sources etc), this will provide the same capabilities in terms of specification as having a difference between attributes and events.

The problem will be with user defined event types, although it would be possible for the submitters to indicate the max types of info that he considers necessary - but different submitters will have different opinions - and that happens today as well - some choose attributes and some events.

Well, I have no conclusion, but urge you to think about the implications for programs - and there are probably other aspects of programs that I have not considered.

If there is a need to have a date for an attribute, that fact type should be encoded as an event.
AdrianB38 2010-12-05T09:51:36-08:00
"possibly a short description that would contain the name of user defined attributes (the current meaning of attributes) when the name can not be carried by standard TAGS"
I certainly want to see the ability to create user-defined types of events (and properties / attributes / characteristics / traits / whatever). Was this what you meant?

"all facts in an event list may also use more screen space than necessary" I wouldn't worry about this - I think it's pointless to try and cram everything in one box. Put a summary in a scroll-bar equipped box, with the detail of the selected item below. No you can't see everything all at once, any more than you can see all pages of a Word document at once. I think people can live with it.

"The problem will be with user defined event types, although it would be possible for the submitters to indicate the max types of info that he considers necessary" Hmm. If you're envisaging the user being able to add extra attributes to an event, I'm not sure how much the application can interpret those extras. If the data is just stored in (say) XML(!) it'll store easy enough. Don't try and process it - just produce a _very_ simplistic display in a text box, allowing update. It's effectively only text to the app.

"some choose attributes and some events" indeed, and I'm relaxed about it - if someone creates a user-defined item, there are no guarantees that anyone else will understand it, nor will the program.

"If there is a need to have a date for an attribute, that fact type should be encoded as an event" Did you really mean that? I think just about any "attribute" (i.e. PACT fact) will be capable of having a date - e.g. names change, physical descriptions change, an SSN is given at a point in time. There isn't an event in any meaningful sense for physical description unless you want to turn things on their head and have a "Described event" - but then that's odd compared to how people say things, and the description could be given years later. Date is a red-herring, in my view, that the English language leads us into.
ttwetmore 2010-11-27T11:13:47-08:00
Events, Properties, Characteristics and Facts
The current Individual Data Elements page states that there are no real differences between these four things. I disagree.

There are three words that are clearly synonyms for the same concept -- properties, attributes, and characteristics. All three imply the ascribing of some value of some property to some entity. The entity is not part of the PAC (propery, attribute, or characteristic, take your pick), rather the PAC applies to the entity.

An event is an entirely different thing. It does not assign a particular value to a particular value of some entity.

In the ontology of objects critical to genealogy, events and PACs are worlds apart. An event is an entity type, what is clearly a stand-alone, top level, record-level object. An event is not a PAC of anything. Trying to twist concepts around by maybe saying that an event is an attribute of the group of people who were involved in the event seems a wholly contorted stretch of the imagination. PACs are never thought of as top-level, stand-alone objects, because they can't stand alone. They exist only in so far as they describe some aspect of the state of a real stand-alone object. Hair color is PAC of a person. A date is an PAC of an event. A role player is PAC of an event. An occupation is a PAC of a person. What in the world is the the top-level thing that an event would be a PAC for?

Tom Wetmore
ttwetmore 2010-12-03T12:00:41-08:00
Since this topic started I found another reference that called a PAC a trait. So PACs should be PACTs.


Do you have any other favorite synonyms for this concept we can add?

Tom Wetmore
brianjd 2010-12-03T12:21:49-08:00
I would disagree a bit here, but only in minor way. A fact could be anything about ... well ... anything. When I speak about a fact it usually is meant to encompass everything evidence related in regards to genealogy. A fact doesn't have to be a correct fact. My mother gave me some facts about my grandmother that I later disproved. Until that time they were facts given to me by the daughter of a person I was researching. It was considered a reliable source. My mother's source was her mother. So, you'd hope the actual person would be authoritative on facts about oneself.

All that aside, I think we probably all agree we need to have an entity that implements PACTs. What are we going to call it? I'm assuming we are only going to create one? I like Attribute, as it has a very neutral perspective.
AdrianB38 2010-12-03T15:19:40-08:00
"I like Attribute, as it has a very neutral perspective"
Unfortunately, as some Data Modelling mavens will tell you, Attribute has a very specific meaning in data modelling and having an Entity Type named "Attribute", and resulting in the attributes of the Attribute entity type will cause no end of confusion...

I prefer Characteristic of the 4 because Property is also capable of confusion (with both real estate and object oriented terms), while Trait - I don't know, doesn't really seem to quite hit the mark for me.
ttwetmore 2010-12-03T16:49:33-08:00
SSShh. I won't call an Attribute an Entity Type if you won't. SShh. Hope no one is listening.

I think it's better to treat an attribute as an, ahem, attribute of an entity, rather than as an entity unto itself. Then you don't have the problem.

I prefer a modeling world with only three major concepts:

Nouns, aka entities/objects/instances belonging to types/classes.
Adjectives, aka PACTs, that give the nouns their individual PACTs.
Verbs, aks operations/functions, not usually in models but implemented by applications that add and modify models.

And I think of the relationships between objects as being implemented by special attributes that cause one entity to refer to another. These special attributes, which really aren't all that special, are attributes whose values refer to other entities (gosh, let's call a spade a spade, they're pointers), but the attribute itself doesn't need to be thought of as an entity.

Yes, we can get all formal and object-oriented and pedantic and say that an Attribute is an Entity type, but what does it really help? It is confusing to explain and even more confusing to be explained to.

(So if you want to call an attribute an attribute you won't have to twist my arm.)

Tom Wetmore
greglamberson 2010-12-03T17:59:02-08:00
What's not to like here? I agree with everyone! Wahoo!

Here's a request: Will someone besides me actually edit the main page to reflect what you're saying here? That wuold be very helpful.
brianjd 2010-12-03T22:41:15-08:00
Well, I'm just wondering how we can call an attribute or a PAC or a PACT an element if it itself has elements within it. When you start adding things to an element it ceases to be an element. If we want to call it an element, I don't care - I was just trying to speak professionally. We can call it a jellybean for all that matters. Whatever works for the majority will work for me, and I'll not worry about how we use technical germs to describe things. I debugged far more cryptic code. But can we please pick one name for it. I say attribute, but I seem to be alone on that bridge, maybe because I use Gramps and that's what Gramps uses. Whether we choose property, attribute or characteristic there is the chance for confusion when discussing it.
All those names simply are to provide extra information about some person, place or whatever we decide to let it that isn't supported by the model. I think it is important to have it so that people can customize and add extra refinements. Maybe we could call it "info". I would prefer any name that at least has some logical connection to it's purpose.

I'd be glad to update the main page, but, do we really want to start an edit war on the main pages? I change this to that, Tom changes that to something else. Doug changes something else back to this. You just can't win. I'm still pretty new at all this, so I'm trying real hard not to offend people and to place nice and politely (I can be quite annoying I'm told) ;'S
greglamberson 2010-12-03T22:58:43-08:00

The way I have been thinking of things is that various elements are actually groups which will be referred to and used in several places.

Regarding terms, I think of an element as something that is only a piece of a record in a database. An entity is something that is a record. Anyway, you're right that we need to codify a glossary for our own use. I'll figure that out tomorrow.

Regarding editing, I'll take care of it if no one else will. All this resides on my schedule under "things to look at tomorrow."
ttwetmore 2010-12-03T23:47:45-08:00

I'm wondering about your use of the term "element." If you are referring to the XML meaning of the word, and if BG uses XML for archiving or transporting, then everything in the BG model will end up being mapped to elements. A Person would be a top-level (meaning something like being inside an overall wrapper element) element, an attribute would be a non-top-level element. I'm just trying to figure out what you're trying to say, since I don't see the term element being used in this discussion thread.

Just to say a little more, as I often do, consider...

<name>Thomas Trask Wetmore IV</name>
<father> <name>Thomas Trask Wetmore III</name></father>
<mother> <name>Joan Marie Hancock</name></mother></person>

(This doesn't correspond to any proposed BG model I believe, and in my opinion, should NEVER be considered as a possible one.)

Think about the elements here. The top one is <person> and this whole element represents a Person entity. The person has a name, one element deeper in the record. This <name> element is an attribute. The <father> and <mother> elements are interesting. They ARE attributes because they are one level deep in an object element so they describe some attribute of the person (you have to think of a father as being an attribute of a person, a little awkward to do, but I hope possible). And the values of these person attributes are really object elements that are embedded in another object element. (This is why I said that this doesn't correspond to any BG model, because I don't think there is any model out there yet that would put a Person entity inside another Person entity.) But still, the <name> elements inside the parent entities are attributes of the parent entities that are in turn attributes of the top person.

In XML each element should represent an attribute of the element that contains it, all the way down. Some of these attributes have simple values (hair color, date of an event), and some attributes have other objects as values. If those objects are naturally the top-level ones, then the attribute values should be references to those other objects and not the objects themselves. The important distinction here is the same one as the computer programming distinction between an object and a pointer to the object.

Anyway, the whole point, to me at least, is that an element is the general mechanism that XML uses to structure hierarchical objects, and that entities and attributes are terms that BG (and just about every other model created by the mind of humankind) uses to describe object type things (nouns) and property type types (adjectives) that apply to the object type things. The values of attributes come from a wide variety of spaces. The value of a date attribute has a complex structure that BG is struggling with. The value of a name attribute, ditto. And it is important to realize that the values of many kinds of important attributes are either other entities or references to other entities.

In most models it is important to distinguish between the top-level entities, entities that will never be the direct values of any attribute, from entities that can be the values of attributes. Persons and Events are good exmples of the former. Dates are a good example of the latter. A date is certainly an entity. But almost no one would suggest that a date should be a top-level entity in an ordinary genealogy application. Just let the date be the value of an attribute that is assigned to an entity. An exception might be a model that represents the historical events that occurred every day. In such an application there would be a top level Date entity and its attributes would include events that occurred on those days. This is a fabulous reversal of the genealogy model in which Events are top-level and Dates are values of attributes to the new model of Dates as top-level and events as attributes.

Note that in the GRAMPS model, the Place entity can be a top-level entity, one that stands alone as a record and can be referred to by any number of other entities, or a Place can be an attribute that is placed inside an Event entity. In the former case the user is deciding that we want to represent some Places as general objects (say, New York City) because many other records will likely refer to it, but very specialized Places can be tucked away inside the Event that happened there.

I did the same thing in the DeadEnds model for Notes. Some Notes can be general and apply to many Persons or other things, so it's best to have Notes as top-level entities and let other entities refer to them. (This is how GRAMPS handles Notes). But some Notes are so specific they apply only to a specific entity (or even more specifically, to an attribute of an entity!). This is GEDCOM approach to notes, as you can attach a NOTE line to just about everything. In the DeadEnds model I've allowed for both types of note implementations.

In the example I gave I deliberately put Persons inside Persons to show something we probably don't want to do, but to try to draw some distinctions between the world of elements, which is the totality of XML world (skipping XML attributes and name-spaces for the time being), and the world of entities and attributes (not to be confused with XML attributes). The two worlds are completely separate but map back and forth. Element is a term from one of them, and Entity and Attribute are terms from the other (again, ignoring the XML attribute concept).

Tom Wetmore
brianjd 2010-12-04T08:56:27-08:00
Sorry, I should explain. I'm a classcially trained physicist. I have a very specific vision of what an element is. When I think of an element, I think of something that is a basic element, that can't be broken down anymore with the exertion of enormous energies. My vision of an element is a quark.
brianjd 2010-12-04T09:45:56-08:00
How does one delete a comment? That previous comment should say "without the exertion...".

A big problem we are having is we are mostly using words as we know and use them. Unfortunately, we are all using different and valid meanings for the same words.

We need to decide on a syntax to use, so we all use the same words to mean the same thing. Otherwise we're talking past each other. Greg is working on one now, I see.

What I would really like to see, and maybe it has been done, but I'd like to see is a proposition of a model and/or a suggested terms to use list to start with from Tom, Doug and Louis. The three who are actually programming genealogy software.

While getting the general community to come up with ideas from the ground up. I'd like to know what Tom, Doug, and Louis would include in a BG model. We know what they are using in their current GEDCOM restrained model, but we don't know what they'd like to see in a new Better Gedcom. It could be they've thought it out in more focused detail than any of us. One of them might already be the just right model. Save us a lot of tinkering too. ;')

Not that I don't love to tinker with things, take them apart figure out how they work, but sometimes a schematic is a good starting point for designing the best overall improvement.
ttwetmore 2010-12-04T09:56:01-08:00
Ah, I see; thanks. I think in you're terminology an element is a value from an atomic class, that is, a class whose values are not usefully broken down further BY THE APPLICATION in question. Good examples are identifiers, strings, booleans, enumerations, numbers, UUIDs, URLs. Some might add dates and names to the list, though there wouldn't be widespread agreement.

Tom Wetmore
dsblank 2010-12-04T10:33:55-08:00

I must admit that I am not the right person to be able to give you a well-thought out list of changes that GEDCOM needs. I know the computer programming part of the problems very well, but I'm an amateur when it comes to knowing the relationships between the process and sources, etc.

But I try to point BG to Gramps people or discussions to connect the right people together. I've listed a set of links under "Shortcomings Of GEDCOM" (to the left) under Gramps Extensions. I encourage you to read through that.

Greg and I have both extended invitations to the Gramps people to participate, with only a few joining in. I understand why they aren't participating here: why argue over here about what could be done, when they can just do it? For example, there is currently design being discussed and coded about a new method of extending Sources to handle "large" sources (like Censuses):


There many people there discussing the right way to do this (eg, so that it still lines up with GEDCOM as best it can, but allows a wide range of uses). And, people are at this minute, writing code to test out these ideas.

I guess my point is that many in the Gramps community feel like they are creating a "better GEDCOM" everyday with Gramps XML, so why go back to the drawing board to rehash solved problems? When you have a limited amount of time, hackers want to hack, not design a standard.

The best I can do is point you to their discussions, and what we have done so far. I'm glad Tom and Louis are participating, and I wish there was a better way to get the Gramps knowledge base into this betterGEDCOM project, too.

louiskessler 2010-11-27T12:07:27-08:00


I don't mind your explanation at all. I actually don't have trouble with Events as entities and PACs as attributes if defined that way.

As long as PACs are allowed to have dates associated with them, I'll be happy.

Please feel free to update the Wiki pages with your ideas.
ttwetmore 2010-11-27T12:18:17-08:00

Thanks for not getting mad at me.

I have no problem with PACs having dates. PACs are naturally recursive and I believe they can have their own sub-PACs. (Of course, by saying that I obviously am granting PACs a certain level of entity-hood -- just not full entity-hood!).

Dates in PACs make a lot of sense. They can indicate when the PAC came into existence (when I dyed my gray hair back to brown!, the date range over which I held that occupation, and so on.) I suppose we could be pedantic and distinguish between inherent PACs, things that are permanent properties of an entity, from PACs that can change, but it's probably not all that valuable. Sex is clearly an inherent PAC of a person (please don't bring up sex changes), while an occupation is clearly a transitory PAC.

Tom Wetmore
louiskessler 2010-11-27T13:27:38-08:00

There's no way I can get mad at you. I'm glad that you took the time to comment and made it clear that there's a difference when events are treated as entities and are not attached to anything. Under the old GEDCOM definition, that was not the case.

But in the meantime, you came up with an interesting new acronym: PAC. It might catch on. :-)
ttwetmore 2010-12-03T17:11:00-08:00
The Incredible Lightness of Being
I am in favor of the Family entity, and though I would prefer it were its own entity type, I think it would be okay if it were treated as a kind of Group. A Group is a collection of references to role players, a few other attributes (PACTs) for the Group, and the semantics that stem from kind of Group it is.

At this level of abstraction, though, one might wonder why an Event isn't a kind of Group also. It's a bag of references to role players, too, with PACTs that apply to it, including two fairly specific PACTs (date and place).

What we have in the genealogical model world is an example of highly networked data. Each entity represents something with semantics; each entity has PACTs germane to its kind; and each entity refers to other entities that bear special relationships with it.

One can imagine entity types at two extremes. At the first is a single entity type, call it a Thing, that covers every kind of entity a genealogist might be interested in. At the other extreme is having a separate entity type for every separate kind of thing a genealogist might be intersted in.

For example, we have pretty much decided that there are certain Things that are so important to genealogy that we are going to bless them with their own being, the most obvious ones being Persons, Events and maybe Places.

But then there is the middle ground. Consider Sources. This is an entity type in most genealogical models, but it is used for many kinds of things -- books, church registers, microfilm roles, and in some models for larger things, such as libraries and archives, and in some models for smaller things, such as pages in a book or entries in a church register. Why do we have certain kinds of entities of the more specific kind like Persons and more general kinds such as Sources? I think the answer is somehow related to how interested we are in the entity type or how important the entity type is. Persons and Events are critical to genalogy and we think about them a lot, needs lots of them, work with them every time we're doing genealogy. Sources are important to genalogy, sure, but we really want to just record them and never think about them again. They're "let's get them out of the way quickly" entities that just go along for the ride after they're created.

The Group entity is in the middle area also. Most people think of it as a group of people that has clear (fairly clear?) semantics -- a family, a club, a jury, a military regiment, and so on. Why do we want Groups rather than a special entity for every kind of group. It's probably because we don't want to get into the pickle of having to define every possible kind of group ahead of time. What if we forget an important one. So it's just better to have a single entity type Group, with a type attribute that can be one of lots of pre-approved types, but with an option for users to create their own. Fair enough.

But are there are other entity types that are significant in genealogy that some genealogist might want to encode in a database record? How about a Ship that an immigrant ancestor arrived on, or information about a Village that a person grew up in (and how is this concept different than a run of the mill Place entity type?). Or how about a cemetery, or a court house?

The proper overall approach in modeling is pretty obviously an object-oriented one, trying to divvy up all the possibile ways of applying an inheritance structure on the universe of all interesting things of importance to genealogists.

When one makes the decision that a Family type is not needed because a Group type is good enoungh, one is making a judgement on the importance of the concept. Everything is a Thing, but we parition up the world of Things, by a network of inheritance, to get the set proper for doing genealogy. When we decide that every possible kind of group of persons can be handled by the same Group type, we are making a decision like that being made for Sources, that we just don't have to distinguish them that much.

Consider an Event. An Event is a Thing, and if you think about it, it's a lot like a Group, that is, a bag of role references to Persons, important PACTs like Date and Place, and other PACTs specific to the kind of Event. But I don't think we'd seriously considering getting rid of the Event type by making it a kind of Group also would we? Please, please, say no.

How about entities that have references to more than only Persons. For example a ship arrival record could have references to the Person arriving, the Place departed from, the Place arrived at, the Dates of the departure and arrival, the Ship sailed in, and so on. But isn't this "just" a kind of Event (I think so).

Do we need a general entity type, call it Entity (see the DeadEnds model), to enable us to record information about any kind of of entity that crops up during our research? I think we do. Real easy to deal with Ships then.

Excurse the topic title, but most other discussion bear little resemblance to their topic either.

Tom Wetmore
brianjd 2010-12-04T08:44:39-08:00
I like the idea of an Entity entity. In fact I like the Dean Ends data model. The thing I like about the Entity entity, is as long as an application makes allowance for it, you'd never have to worry about losing content. The user could create their own buckets to drop things in. Let's call it Thing! Just kidding.

My perspective is clouded by my years of programming experience and training as a scientist. I'm a minimalist in some respects. With that in mind my take on the Family vs Group discussion is families are specialized versions of groups. Whether they are broken out in the model as separate entities or not is irrelevant. If I ever write a program for the BG model, I would wind up combining the two as a single object internally, but the users would see two.

So, I'm already logically combining them, because it makes the model smaller. I don't want to wind up with 10,000 page standard. Preferable, I'd like it to fit on one. Like the Dead Ends model. Obviously, there are more details in the dead ends model, but you can see all the core of the model on a single page.

However, the users side of me wants to see all kinds of richness in the model. I've used lifeline, and probably would still be, but I think when I used it it didn't have a GUI front end. Then I discovered Gramps. I love Gramps. Well except that it doesn't remember the last things that I've done in a session, and I sometimes have to do three of four clicks and type three or four keys to input the next record. Occasionally, I have to type in whole words or phrases, and the occasional sentence or two. Yeah, I've gotten lazy. lol. Long ways from DOS 3.1.

There is something I really love about the Gramps model, shared information and specific to a record information. This applies to many entities. I'll give two examples.

When entering a source, I can pick an existing source to add for a piece of evidence, and then I can add a specific extension to that piece of evidence as further mini-source entry. So, a source within a source. So say I'm going through the The History of Harlem, By Jame Riker, 1908. I've added a list of the children listed of Resolved Waldron. Some are on one page and others on others. I can add the book as a source, then for each child a specific entry.

The other example is for location. One can add specific information about just the place of the current piece of evidence one is working on.

What I'd like to see accommodated in the model, and I seen others suggesting this, is to be able to have a place or a source, and then have an extension part specific to the individual record it is being tied to. For places, these extension could be address-parts. Like "plot 2, Section C. Lutheran Cemetery" or "412 Maple Street", or "Steerage". For sources it could accommodate things like "Line 24, Page 256, Volume III".

Although, sometimes, I'd like to share that specific entry too - but that's a data entry thing, mostly.
AdrianB38 2010-12-04T13:16:34-08:00
Tom - you got me with ships - I have a number of free-standing notes in my GEDCOM file containing the abbreviated history of ships that my relatives went on. Why? Because I want to....

Never thought of it before... But right now I can't think of a good entity type name - most of the generics (e.g. "Object") are already taken up in Data Modelling and other IT terms!
ttwetmore 2010-12-04T14:37:37-08:00
Quoting Brian (thanks for the kind words by the way): "When entering a source, I can pick an existing source to add for a piece of evidence, and then I can add a specific extension to that piece of evidence as further mini-source entry. So, a source within a source. So say I'm going through the The History of Harlem, By Jame Riker, 1908. I've added a list of the children listed of Resolved Waldron. Some are on one page and others on others. I can add the book as a source, then for each child a specific entry...The other example is for location. One can add specific information about just the place of the current piece of evidence one is working on...What I'd like to see accommodated in the model, and I seen others suggesting this, is to be able to have a place or a source, and then have an extension part specific to the individual record it is being tied to. For places, these extension could be address-parts. Like "plot 2, Section C. Lutheran Cemetery" or "412 Maple Street", or "Steerage". For sources it could accommodate things like "Line 24, Page 256, Volume III".

Take a look at these two rules from the DeadEnds model:

sourceRef :: sourcep : [ id sourceRefAttr* ]
sourceRefAttr :: attribute

These are the rules for creating source references, that is the "pointer attribute" in one record that refers to its source record. The first rule says that a source reference is the keyword sourcep followed by an id attribute followed by any number of "source reference attributes". The id attribute just gives the UUID of the source record so it can be found. The additional source reference attributes serves the purpose you outlined in your points. For example, the source could be a book, that is, the UUID points to a source record for a book. But you could add source reference attributes, the most obvious one being a page number, to further modify the location in the source.

And if you look at the DeadEnds model you will also find:

placeRef :: placep : [ id placeRefAttr* ]
placeRefAttr :: attribute

These are analogous to the source case and allow any reference to a place entity to provide the extra info as you suggest.

I think the idea of letting the references provide specific details about the objects they refer to serves to better organize and even simplify the data objects. If you had to have a separate source entity for every page in a book you extracted information from, I think you would not like the solution. With one clean source for the book, with the references providing page numbers, things are simpler and in my opinion cleaner.

Tom Wetmore
ttwetmore 2010-12-04T14:47:01-08:00

I'm with you. I've keep information about ships too, which is why I used it as an example. I have found some nice sources (e.g., Lloyd's registers) that can sometimes provide good historical information about ships. I have found, over the years, that ship arrival records are among the most exciting and goose pimpling records one can find when doing genealogy. Seeing my great grandfather arrive on a small schooner when he was a four year old, is still one of the most serendipitous experiences I have had in researching the family. Then researching the history of the schooner and discovering that my great great grandfather was one of the carpenters who built the schooner just added to the pleasure.

When I found the record of my wife's peasant grandmother arriving in Philadelphia from Poland (West Prussia at the time) it was a tear-inducing moment. Yeah, ships!

Tom Wetmore