Home > Sandbox > Test Suite




As formulated on the GOALS page, we think it would help to offer a test suite of data for software programmers.
To show off the features and possibilities of Better GEDCOM, we should write/create separate files for various features. This way, it will be easier to see what the importing application did with the information in the BG-file.

What should be in a BetterGEDCOM Testsuite? (please edit / add your ideas!)









I believe a really well thought-out test suite would be of enormous value for programmers and would help BG become successful.
I also think that building this suite will be an enormous task!

End-users should try and think of difficult real-life (and imagined) examples and just write them down. Try to "condense" as many difficulties into a few constructed "cases".
Techies would then build these examples into BG-files. The specification doesn't have to be finished for this. Working on examples while formulating the definitions of BG will help with finding and fixing problems.


Test Cases

  1. The mystery Aunt
  2. GJ Cases


Discussions


Comments

testuser42 2011-01-09T05:37:32-08:00
Collect case studies!
Please think about interesting problems that have come up during your family research.

Then just write the whole "case" down in plain text, either on this page or on a new sub-page (but please don't use the discussion pages). You don't have to give all the source data or the true names and places, just as much as is needed to understand the problem(s) and your reasoning on the way to a possible conclusion.


Then the techies (Tom, Louis, Mike, Adrian, Christoffer, someone from GRAMPS,...) can show how this case would end up looking in their file format.

Then we can discuss differences and hopefully see what works best.
GeneJ 2012-01-21T07:46:14-08:00
A few thoughts about the requirements for cases:

1. Cases that follow the research model--these should begin with a focused goal (a question that seeks the answer to a genealogically relevant question; these are questions of identity or relationship) and proceed through the research cycle to a conclusion.

2. Cases that follow the research model, but take very different paths:

(a) Cases where the genealogically relevant question is solved by relying only on the application of direct evidence
(b) Cases where the genealogically relevant question is solved by relying on the application of direct and indirect evidence
(c) Cases that involve where the genealogically relevant question is solved by relying on the application of all forms of evidence--direct, indirect, circumstantial, negative

3. Cases that involve conflicting evidence (of all forms), such that the genealogically relevant question may be solved only by weighing all forms of evidence--direct, indirect, circumstantial, negative.

4. Cases, as above, more specific to materials in increasing or different degrees of complexity:

(a) Materials intended to provide particular information directly, from within the four corners of the document. For example, a birth records is intended to answer specific questions that are genealogically relevant about a birth fact, a death record is intended to answer specific questions about the death fact.
(b) Materials intended to provide particular information, but from within the four corners of the document, that information either does not, or is not intended to answer genealogically relevant questions. For example, estate documents, deeds/real estate transactions; some census. These should range from simple to more complex.
(c) Materials that rely on the identity of the creator, event, annotation and provenience or some other context in order to make some determination about information therein. Of particular interest would be cases that contain multiple references or perspectives on what might or could be the same person or event. These should range from simple to more complex.
testuser42 2011-04-21T02:59:59-07:00
"Left 11 children"
From http://bettergedcom.wikispaces.com/message/view/Defining+E%26C+for+BetterGEDCOM/37924068#37977562
GeneJ says:
testuser42 2011-04-21T04:03:28-07:00
My idea would be to record just what the source says. Thats what Personas=Evidence Person Records do.
In this case, it would be a "Persona" Record for the deceased, and in that Record the 11 surviving children would be mentioned. This could be in a Note of some type, or, maybe better in a special tag. Something like
<FACT type="Number of children">11</FACT>
I would not want to have eleven empty "Persona" Records.

  • When you scour the vital records, if you only find 11 vital records, what happens to that first entry?

That first entry will never be changed, since it only contains what that specific Source contained.
If I now have evidence of the 11 children, then each of these children will be getting their own "Persona" (linked to the source it came from, and only containing the information that is in that source).
If I want to link the children to the father, this should also follow the evidence:
  • Say I have the birth record that mentions child, father and mother. This will produce 3 new Evidence Persons (=Personas) and one Evidence Birth Event. The Birth Event will link the three Personas.
  • Then, after making conclusion that the father from the birth record source is the same person as in the obituary source, a new Person Record will be created to link both these "Personas". This is then a "Conclusion Person Record". In this Record, I can place my reasons for coming to the conclusion that the two people from the sources are in fact the same real life people.
  • Now repeat 10 times ;) Create a new Persona for every child, containing just what the source says. Then link the child Persona to the latest Conclusion Person for the father via a new higher level Person Record. This will make a big tree for the father, but that's not a problem for computers. You will always see just the same display with more and more children added. You only extract information from sources, creating new "Personas" and "Events" from these sources. Then, you click on a button like "connect persons" and maybe then give a reason why you want this connection established. The rest happens "under the hood".
  • Alternatively, using a 2-tier Model, there would only be one Conclusion Person for the father, and links to the kids and reasoning would be added one after another. This might be more "human readable", but may be a bit more difficult to dis-entangle(?) if you want to undo a conclusion - or is there no difference?

In short, if I find things that change my understanding of the real life human being that the obituary was about, then
  1. these things will be recorded in their own specific Persona linked to their specific Source, and
  2. the new Persona and the old one will be linked with a Person Record that's one level higher. In this Person Record I can put my new findings or problems, maybe linked to an Administratory/Research Log Record.

  • When you then find a 12th record--do you create a persona for one child then deceased (or do you just divorce the family at this point)?
The first thing. You create a Persona, entering only what the source is telling you about that human being. All your conclusions will be in the next level, by linking the new persona to other Person Records.
If that 12th record doesn't say anything about the death, but you conclude that at the time of the father's obituary, this child was not alive anymore, then I would create a new Conclusion Person for the child that only says "died before date x" and give my reasoning linked to the Source "obituary", pointing out the 11 surviving children.

I've never mentioned a "Family Record" because there is no need for one. Of course, if the evidence in the source warrants a "Family" then you can or should create one. E.g. the obituary will probably speak of his family, so this would be a good reason to create one. It should only contain the information from the obituary! So, kind of like
0 FAM @f1@
  1 SOUR @s1@
  1 PersonRef @p1@
    2 ROLE father
  1 NumChildren 11 
    2 NOTE Left 11 children to morn.
which would solve the NumChildren very elegantly.

phooo, long post. hope it's clear.
testuser42 2012-01-15T12:15:47-08:00
Simple Test Cases for starting out and comparing
In TestSuite01 - Test Data Format, Tony said: I imagined a fictitious family (or set thereof) that embraced all the real-world situations and vagaries that we want to support.
Given that we may make the cases public (e.g. to show vendors something they cannot handle at present), or produce real test-date later on that's based on the test-cases, then I think the names and d.o.b can be invented at least :-)
I think a good start is an informal list of all the things BG must support or handle properly. We can then expand on those to create associarted test cases.
and I have a think balloon above my head with a picture of the Munsters posing for a family protrait !!
I don't know the Munsters that well, so I can't say if they'd be perfect as an example family ;)

But, for starters, I think it would be good to put down on this page the informal list Tony mentioned. It would probably be condensed from the Better GEDCOM Requirements Catalog.

Also, I think examples that follow the possible step-by-step real-life process would be helpful:
- First there's a small bit of data you want to put into your program, let's say something you wrote down during a conversation with your grandma. How would an exported BG look at that point?
- Then you do some research about the topic, and find new sources and extract more pieces of evidence, and add these. How does the exported BG look now? (Would it look different if you added data in a few sessions and not in one chunk?)
- Later, maybe you'll find that one of the pieces you added before is not about the person you are researching. So you get rid of these conections, in whatever way your software provides. How might the exported BG look after this operation?

...and so forth, following the developments as they might happen.
ACProctor 2012-05-15T08:35:13-07:00
Re: "Your example is very interesting and leaves me with a very positive opinion. The only significant difference between your model and mine seems to be the extra evidence layer, which acts to provide a "bag" of personas and places and the source information that describes where the evidence is from."

This is not a "model", though, Tom. I was merely trying to come up with a way of expressing test-cases that didn't presume a particular model or data-format. If the separation of evidence away from all reasoning and conclusion helps inspire a future direction for model development then that's good but it's not my main thought here. I've rushed a description of the units of evidence for the test-case we're discussing - I say "rushed" because I'm afraid I've left some of the inferences in the notes.

Louis - I apologise for mentioning that word [smiling here]. This notation was to be a semi-formal way of depicting just the evidence, separately from reasoning and conclusion, which obviously meant that each person that was referenced was merely an "evidence person". I think I should have left it at that since I don't want to confuse this notation with any existing data models or entity concepts. If a particular data-format has some concept of personae (...and there's a whole grey-scale of what that might entail) then there would be an easy way of generating them from this notation.

However, as I said, this shouldn't presume any particular data model or technique. In describing the "pure" evidence separately from the "thought parts" of the data, it was trying to find a completely neutral approach for describing a fundamental part of any data. I feel that we all get so focused or attached to specific models, formats, or entity concepts that it's quite refreshing to strip it all bare.

Evidence
    Source: Conversation with Grandma
    Confidence: not so high. Grandma wasn't so sure herself
    Person
        Name: John Doe
        Note: her grandfather, my g-g-grandfather
        Death: between 1904 and 1906 in Ourtown, Alabama. She was not yet in school, and she started school in 1906
        Birth: around 1830 in Histown, Connecticut. He was in his seventies
    End Person
End Evidence
 
 
Evidence
    Source: Handwritten note by Jane Doe, daughter of John Doe
    Note: High quality scan of note available
    Confidence: high
    Person
        Name: John Doe
        Death: John died of a heart attack on the 3rd of April 1905, in the front yard of his home in Ourtown, while carrying groceries he just brought from the store
    End Person
End Evidence
 
 
Evidence
    Source: Copy of death cert for John D Doe
    Confidence: looks official but have to find out more about the burial
    Person
        Name: John D Doe
        Death: 1905-04-03 at 11am in Ourtown, Alabama aged 74
        CauseOfDeath: Heart attack, confirmed by Dr Doctor
        Note: Handwritten note on death certificate saying "Buried 3 days later, St. Michael's"
    End Person
End Evidence
 
Evidence
    Source: Birth cert of Johann Doe
    Note: this is the only J. Doe to be found in Histown, Connecticut that was born in 1830 +-1
    Confidence: The data is trustworthy, and it's a really good fit. Could be my John D Doe, but needs more proof!
    Person
        Name: Johann Doe
        Birth: 21st December 1830
        Parents: Hans and Marie Doe
        PlaceofBirth: Histown, Connecticut
        Baptised: 25th Dec, Catholic
    End Person
End Evidence
 
 
Evidence
    Source: Birth cert of John Dorian Doe
    Note: Grandma found this in a drawer with other family documents
    Confidence: quite high
    Note: Grandma says that she wasn't sure about the "Connecticut" anyhow
    Person
        Name: John Dorian Doe
        Birth: 3/3/1830
        Parents: Stephen Doe and his wife Hilda, née Schmidt
        PlaceOfBirth: Histown, Massachusetts
        Note: he was their 2nd child
    End person
End Evidence
 
 
Evidence
    Source: Death of Johann Doe. Article in the Histown (Connecticut) Paper, 8th Sept 1834
    Text "A fire consumed Doe's smithy yesterday, one of the oldest buildings in town..."
    Text "the blacksmith Hans Doe couldn't save his youngest son, Johann, age 3 and 3/4..."
    Note: this is to show that Johann Doe of Histown Connecticut is not my John D Doe
 
    Person
        Name: Johann Doe
        Death: 7 September 1834 aged about 3 years, 9 months
        PlaceOfDeath: Histown, Connecticut
        Birth: 1830 or 1831, subtracting age from date-of-death
    End Person
End Evidence


Tony
ttwetmore 2012-05-15T11:29:18-07:00
Tony,

For me, every example I see or create has a data model behind it, either real or imaginary. So I would never say that it's not time to be thinking about models. I believe an important aspect of human intelligence is the constantly executing model-maker that runs in the background of our brains, always trying to find the patterns that make the world make sense . When I did my example, I had the DeadEnds model in mind, even though it is still mostly a figment of my imagination. My example is, of course, not a model, but for me, it is the expression of something that fits the model.

When I read your example I inferred certain things about your model, for example that your model will allow you to represent evidence as bags of person and place "records"/sub-structures with accompanying source material. When I read Louis's example I think I am seeing into his ideas about his model, where his bags are potentially even bigger, with substructures from what would be, for me, many items of evidence. You and I definitely have personas in our models. I also have multi-level personas up to conclusion persons. In your case I'm not yet sure about other person concepts. Louis has a single person concept, the conclusion person, and he seems to cut and paste and annotate information from his source records to build those conclusion persons. Louis's model of a source includes the information that is in the source, while yours and mine separates the sources from their contents. For me these are all aspects of the underlying models that are in our minds.

Tom
louiskessler 2012-05-15T11:43:14-07:00
Tom said: "Louis's model of a source includes the information that is in the source, while yours and mine separates the sources from their contents."

Tom. You said it perfectly. That's exactly the difference.

Louis
ttwetmore 2012-05-15T12:48:34-07:00
Louis,

What is interesting about these discussions is that we all know what the important information is, and we all record that important information. The differences come in how we design the "containers" to hold the information. From the 50,000 foot level it sometimes doesn't seem all that important what the containers are, as long as all the information exists and the important links between information of different types exist and are consistent.

Tom
ACProctor 2012-05-15T12:53:03-07:00
Well, I'm glad we got that sorted ;-)

Back to this test-case now...

What's everyones opinion on the idea of separating the evidence from the reasoning/conclusion in a test-case, and making it semi-formal like this?

I realise my examples here are very "fluid" but that could be tightened up a bit. There must be a wealth of potential torture cases and basic unit-level test cases, all of which we'd need to represent in a way that people can follow.

As well as seeing them represented in existing data models, we'd also need to check their representation in each refinement of a new data model.

Tom - I don't want to get bogged down in semantics but there is no real storage data model in this notation. It simply acknowledges that each source (or "unit") of evidence contains facts about a number of people and/or places.

Tony
Andy_Hatchett 2012-05-15T14:48:05-07:00
Glad to see discussion on this but...
Being a non-tech person my head is spinning following this and also trying to read Tony's Stemma write-up at:

http://www.parallaxview.co/familyhistorydata/

Ah well- at least it keeps me off the streets!

;)
louiskessler 2012-05-15T19:23:49-07:00

Tony:

Yes, the reasoning/conclusion and all subjective things should be completely separated from the source and source details which should only be facts without interpretation.

Also, I don't like your liberal use of the word "evidence". Evidence doesn't just exist on its own. Sources and source details exist on their own. Sources and source details become evidence ONLY when they are used to support (or counter) some reasoning/conclusion.

A birth certificate (source) states: "John Doe, born in Chicago in 1850" is a source detail. It is not evidence unto its own.

Once you have added the birth date and place to the information for your John Doe, then you have used the birth certificate as evidence about your John Doe's birth.

See the subtle use of evidence. It is like the linkage between the conclusion and the source.

Louis
ttwetmore 2012-05-15T19:33:40-07:00
Tony said, "Tom - I don't want to get bogged down in semantics but there is no real storage data model in this notation. It simply acknowledges that each source (or "unit") of evidence contains facts about a number of people and/or places."

No need to respond, but I don't know what a real storage data model is, nor do I understand how it bears on this discussion.

Tom
ttwetmore 2012-05-15T20:06:00-07:00
Tony, Louis,

I agree that reasoning and conclusions should be separated from sources. However, I believe that evidence extracted from sources can be subject to a little editorial patching. For example, if a record contains a blatant error, I believe the data as it exists in the evidence should be recorded, but the corrected information should also be included in the evidence record, maybe it should even be the main information, with the actual data subsumed into a note. You might argue that correcting a blatant error in evidence is the wrong thing to do, but I feel we must be driven by pragmatics and common sense as much as by anything.

I have seen arguments that the first layer of conclusions is just coming to an understanding of what the evidence actually means (e.g., whether the marks on the certificate that seem to say "New London, Conn", mean the actual city of New London, in New London County, in the State of Connecticut), before then using that evidence to make any conclusions about persons. Yeah, yeah, these are understandable arguments, but in the larger scheme I see them as silly, confusing to almost everyone, and of little to no practical use.

I understand the argument that facts aren't evidence until used in making a conclusion, but I call facts evidence immediately, and I don't seee any reason to worry about it. It doesn't change anything about how we record information, or how we construct a model. When I am researching someone, and I am collecting facts about every person with the same name who might potentially be that person, a purist would argue that my facts aren't evidence until I make an explicit decision from them. Okay, I agree, but I also ask, why does it matter? If it does it's too subtle for me to understand. I collect facts to make conclusions, and I am willing to go out on the limb and believe that I will be able to use the information for conclusions one way or another soon. And the facts don't even have to be about people that might be people I am currently interested in. I'm comfortable calling any fact gleaned from anywhere evidence, as I can recognize that every fact is always on the verge of its own incipient evidence-hood, and I'd rather have one word to describe all the facts I collect, rather than two.

Tom
ttwetmore 2012-05-15T20:08:53-07:00
Louis said, "See the subtle use of evidence. It is like the linkage between the conclusion and the source."

It's not subtle at all in my opinion, It is what evidence is.
louiskessler 2012-05-15T20:39:10-07:00

Tom,

What you say makes sense. I especially like your statement of "incipient evidence-hood" which is true.

Louis
ACProctor 2012-05-16T09:24:16-07:00
Re: "Also, I don't like your liberal use of the word 'evidence'"

Louis - let's just agree that it's an emotive word. In general discussions, I'm not totally happy with the word either, and have already written about it at E&C.

In the context of defining test cases, though, the word is more appropriate since this is the "evidence" that the reasoning & conclusion parts will be referring to. Unlike a real-life case, this notation need only present examples of useful data for the test case, i.e. data which will be referred to and which is likely to contribute to a conclusion in some positive or negative way.

Tom - The same can be said of "blatant errors in source records". As we're discussing test cases, and the representation of the data for test cases, we can assume that all data is an authentic transcription of some records (which may even be fictitious ones), except in the case of unit tests that specifically wants to test the way that such discrepancies [between what's written and what's read back] are handled.

Tony
testuser42 2012-05-13T12:46:04-07:00
At soc.genealogy.computing, Ian Goddard has a lengthy historical example of wrong conclusions that lead to a long rat tail of wrong conclusions.
Message-ID: <a0vl9vFn85U1@mid.individual.net>
"Recycling conclusions as evidence - a counter-example"
I'm not going to copy-and-paste the whole thing, but I was wondering if this could be another test case.

The gist of the case: Someone makes a wrong conclusion: X is a daughter of Y and Z. This is put down in books and others take it for the truth and connect their work to this (faulty) evidence.

I think there are a few questions here.
If people build their trees from the known to the unknown, the error of connecting to a wrong person should be easily undone. Cut off that branch, it's wrong. Everything else is still right.
How would this be reflected in a BG file, in 1, 2, n-tiers?

Also, how did we find out that there is an error? What are the steps to disprove the conclusion that we found in a book? How would a BG look before and after? Or in between, when we were collecting evidence that disagrees with the old conclusion?
ttwetmore 2012-05-13T13:58:22-07:00
Here is the DeadEnds version for the example evidence.

1. Source: Conversation with Grandma
Person of interest: John Doe (her grandfather, my g-g-grandfather)
PFACTs:
- Death: between 1904 and 1906 in Ourtown, Alabama (she was not yet in school, and she started school in 1906)
- Birth: around 1830 in Histown, Connecticut (he was in his seventies)
Confidence: not so high, Grandma wasn't so sure herself

person: {
id: aaaaa
name: John /Doe/
death: {
date: between 1904 and 1906 { note: He died before the source began school in 1906. }
place: Ourtown, Alabama, United States
}
birth: {
date: about 1830 { note: Source states John was in his seventies when he died. }
place: Histown, Connecticut, United States
};
source: {
type: conversation;
person: ... name of Grandma ... { note: She is a grandaughter of John Doe. }
quality: medium { note: Source is not sure of this information. }
}
}


2. Source: Handwritten note by Jane Doe, daughter of John Doe
Person: John Doe
PFACTs:
- John Doe died of a heart attack on the 3rd of April 1905, in the front yard of his home in Ourtown, while carrying groceries he just brought from the store.
File: scan of the note.
Confidence: pretty good.

person: {
id: bbbbb
name: John /Doe/
death: {
date: 3 April 1905
place: Ourtown, Alabama, United States
cause: heart attack
note: He died in the front yard of his home while carrying groceries.
}
source: {
type: note
author: Jane Doe { note: Source is a daughter of John Doe. }
quality: high
media: ... URI or file path of the scanned note ...
}
}

3. Source: Copy of death cert for John D Doe
- John D Doe died 1905-04-03 at 11 a.m. in Ourtown, Ala.
- aged 74
- Cause of death: Heartattack, confirmed by Dr Doc.
- Handwritten note: "Buried 3 days later, St. Michael's"
File: scanned document.
Confidence: looks official, but have to find out more about the burial.

person: {
id: ccccc
name: John D /Doe/
death: {
name: John D Doe
age: 74 years
date: 3 April 1905
time: 11:00 am
place: Ourtown, Alabama, United States
cause: Heart attack { note: Cause confirmed by doctor on the death certificate. }
}
burial: {
date: 6 April 1905
cemetery: Saint Michael's
note: This information comes from a hand written note on the death certificate.
}
source: {
type: death certificate
... other properties if avaiable to identify source of the certificate ...
media: ... URI or file path to scanned copy ...
quality: high
}
}

4. Source: Birth cert of Johann Doe
Note: this is the only J. Doe to be found in Histown, Conn that was born in 1830 +-1
Person: Johann Doe
- Born 21st December 1830
- Parents Hans and Marie Doe
- in Histown, Conn
- baptised 25th Dec, catholic
File: scan
Confidence: The data is trustworthy, and it's a really good fit. Could be my John D Doe, but needs more proof!

person: {
id: ddddd
name: Johann /Doe/
born: {
date: 21 December 1830
place: Histown, Connecticut, United States
}
father: { name: Hans /Doe/ }
mother: { name: Marie }
baptism: {
date: 25 December 1830
denomination: Catholic
}
note: This is the only J. Doe that was found to be born in Histown, Connecticut, between 1829 and 1831.
source: {
type: birth certificate
media: ... URI or filepath of sanned copy ...
quality: high
}
}

5. Source: Birth cert of John Dorian Doe
Grandma found this in a drawer with other family documents.
Person: John Dorian Doe
- Born 3/3/1830
- to Stephen Doe and his wife Hilda, née Schmidt
- in Histown, Massachussetts
- he was their 2nd child
File: Scan
Confidence: quite high.
Note: Grandma says that she wasn't sure about the "Conn" anyhow.

person: {
id: eeeee
name: John Dorian /Doe/
birth: {
date: 3 March 1930
place: Histown, Massachusetts, United States
}
father: { name: Stephen /Doe/ }
mother: { name: Hilda /Schmidt/ }
note: He was the second child in his family.
source: {
type: birth certificate
repository: Grandma's house
quaility: high
media: ... URI or filepath ...
note: Gradma was sure about Connecticut.
}
}

6. Source: Death of Johann Doe
Article in the Histown (Conn) Paper, 8th Sept 1834
"A fire consumed Doe's smithy yesterday, one of the oldest buildings in town..."
"the blacksmith Hans Doe couldn't save his youngest son, Johann, age 3 and 3/4..."
Note: this is to show that Johann Doe of Histown Conn is not my John D Doe.

person: {
id: fffff
name: Johann /Doe/
death: {
name: Johann /Doe/
age: about 3 years, 9 months
date: 7 September 1834
place: Histown, Connecticut, United States
}
birth: {
date: 1830 or 1831
note: Date computed from age at death.
}
father: { Hans /Doe/ }
source: {
type: newspaper article
text: A fire consumed Doe's smithy yesterday, one of the oldest buildings in town..." "the blacksmith Hans Doe couldn't save his youngest son, Johann, age 3 and 3/4..."
quality: high
}
note: This proves that Johann Doe of Histown, Connecticut, is not the John Doe of Histown, Connecticut.
}

After creating these six evidence level persona records we would create a conclusion person that refers to (links to using ids) the personas that we conclude refer to the John Doe of interest. It would be the researcher's choice as to whether they'd get rid of the two personas for Johann Doe. Software would refer to the four personas to retrieve other data about John Doe. Note that I put his name and birth and death info in the conclusion record. This is done in order to choose which information from the evidence personas should be used to summarize the person at the conclusion level.

person: {
id: zzzzz
name: John Dorian /Doe/
birth: {
date: 3 March 1930
place: Histown, Massachusetts, United States
}
death: {
date: 3 April 1905
place: Ourtown, Alabama, United States
}
person: { id: aaaaa }
person: { id: bbbbb }
person: { id: ccccc }
person: { id: ddddd }
source: {
type: conclusion;
proofStatement: ... Proof statement that explains why these four personas are the ones that refer to the John Doe of interest ...
}
}

At this point I would create two persona records for the parents, and two person records for the parents. There would also be a conclusion level family record to link the three conclusion level person records, basically just like GEDCOM.
ttwetmore 2012-05-13T13:58:53-07:00
Sorry the indentation got all messed up.
ACProctor 2012-05-14T02:29:29-07:00
That example of Ian's is interesting, but the narrative style takes a few goes to read and fully assimilate [well, it did for me].

I agree with Klemens [testuser42] that it would make a good test case. What's the best way of representing its essence though? If you could have a program that would crank-out a Dead Ends or GedcomX version then you'd need a formal representation of the case that it could read as input Even then, there might be variations of how a format might represent the same test case.

Given the different approaches to handling E&C (and the intermediate "reasoning"), I would suggest we could start with a generic notation for just representing the evidence (i.e. the transcribed PFACTs, citations, etc) with no interpretation.

All the reasoning and conclusions could be represented as simple narrative text that refers back to the evidence notation.

It feels like that could give us some flexibility for translating the case into specific models & data-formats, some of which are more different than others. I still think that has to be a manual process though.



Tony
testuser42 2012-05-14T04:23:18-07:00
Tom, thank you for your DeadEnds version! If you want the indentation to work, I think you need to use [[code]] tags (when using these you need to add manual line breaks for long lines).

I see you are putting the Source inside the Person. I guess this is an alternate way, as long as the source really is only about one person. It definitely is "smooth" for a human looking at the file... But might this be a little inconvenient for the programmer?

I can guess at how theoutput would look like after every step, before you have any of the later information. Maybe you could still show how the conclusion person would look after step 4 (if you concluded that this "Johann" was the "John" you're looking at, but still recording your low confidence) and then after step 5 (where you discard the last conclusion because of the new evidence. Maybe keep a note to remind you of the wrong lead, to follow up with more definite evidence)
testuser42 2012-05-14T04:27:54-07:00
Tony, I agree, it is not easy to work through narrative examples. A simple notation would be very helpful. Some kind of Tag-Value system should be sufficient? I tried that with the list above, but even there I did not stay consistent...
ACProctor 2012-05-14T05:23:24-07:00
I'm trying to imagine a way of writing-up a test case that formalises the representation of evidence (to some degree) but not the reasoning or conclusions related to it. It's an interesting exercise to try and banish all thoughts of existing models or data-formats and just represent what's there, and then talk about it in a free-form way. It's hard to not try and create mental data structures too early

That would mean we'd need to forget about any tree representation, forget about FAM records and offspring linkages, forget about the mechanics of citation-elements/sources, forget about Event entities, etc.

This is totally off-the-cuff so I may change my mind before lunch but imagine something like a unit of evidence...

Evidence tag
Source description-of-where-it-came from

Person tag
PFACTs ...
notes ...
End Person
Place tag
PFACTs ...
notes ...
End Place
End Evidence

The source description could be as formal or informal as necessary as long as all the information is there. Each bit of evidence says something about people or places, but - since it's only evidence - we shouldn't rush to create real Persons, or join them to similarly-named people in other units of evidence. In effect, the Person/End-Person is an evidence-person (or persona), and the Place/End-Place is a similar concept for the place.

Note that this is only a written notation and shouldn't be mistaken for a data-format, or even a model.

The tags would be some unique names that could be referenced in some narrative analysis of the evidence.

I know I probably need to create an example to explain this more clearly. I think Tom's Dead Ends data could be easily factored into this style

What do you think folks?

Tony
ACProctor 2012-05-14T08:37:05-07:00
Sorry, I messed up my indentation too. I'll have another go..

Evidence tag
    Source: description-of-where-it-came-from
 
    Person tag
        PFACTs ...
        notes ...
    End Person
    ...more persons mentioned...
 
    Place tag
        PFACTs ...
        notes ...
    End Place
    ...more places mentioned...
End Evidence

Here's a simple example of a birth registration. It also presents global properties, i.e. ones not specifically related to a single person or place:

Evidence birthRegMKirk
    Source: certificate of civil birth registration [1840/Q1/14/387]
    Informant: pWilliam
    DateOfReg: 1840-03-28
 
    Person pMelicent
        Name: Melicent Kirk
        DateOfBirth: 1840-03-17
        PlaceOfBirth: plGrantham
        Sex: girl
        Father: pWilliam
        Mother: pHelen
    End Person
 
    Person pWilliam
        Name: William Kirk
        Occupation: Labourer
        Residence: plGrantham
    End Person
 
    Person pHellen
        Name: Hellen Kirk
        MaidenName: Hill
    End Person
 
    Place plGrantham
        Address: Grantham, Lincolnshire
    End Place
End Evidence

If there are several units of evidence - as in the example of Ian Goddard - then the reasoning and conclusion narrative can use the tags such as pWilliam to point to the relevant data without cluttering up the case being made.

I imagined that narrative could be in the form of short paragraphs, or even setences, each making a distinct point.

Tony
ttwetmore 2012-05-14T10:05:46-07:00
TestUser said, "I see you are putting the Source inside the Person. I guess this is an alternate way, as long as the source really is only about one person. It definitely is "smooth" for a human looking at the file... But might this be a little inconvenient for the programmer?"

I think it is important to deal with two kinds of sources. The "one-of's" like a note from grandmother, and a big one, like a book or a register.

In DeadEnds I want the user to be able to either have the source be self-contained information at the place where the source reference is required, or have the source reference point off to a separate, self-contained source record. DeadEnds allows both approaches. In this case, for simplicity, I choose to keep the sources self contained in the persona records. I could have done the example with external source records instead.

Tom
ttwetmore 2012-05-14T22:58:09-07:00
Tony,

Your example is very interesting and leaves me with a very positive opinion. The only significant difference between your model and mine seems to be the extra evidence layer, which acts to provide a "bag" of personas and places and the source information that describes where the evidence is from.

If there are not going to be independent source records then your solution is more efficient because the personas and the places in the evidence bags don't need explicit references to the source information since they get it all for free by simply being in their bag.

If there are only going to independent source records they my solution is more efficient, especially when many persona are extracted from the the same sources.

If we support both independent source records and source information embedded in personas and places (or in evidence bags as you do it), which is the DeadEnds approach, then how we trade off between those two types of sources will determine which solution is more efficient.

I think your solution is close to Louis's preferences. Both our solutions have full bodied personas which is absolutely critical in my opinion.

In the DeadEnds model I also allow places to be independent records or self-contained structures within other records. In my example I made all the places internal substructures. In your example you made all places independent records. (I'm defining "independent record" to mean anything that needs to have an id so it can be referred to from an external place.) I could have made Ourtown, Alabama, and Histown, Massachusetts, two independent place records, but to limit the number of records in the example I choose not too.

From this you can see that I have no qualms with allowing the same information to be represented in more than one way depending upon user preference. There was a Better GEDCOM requirement that said this was a bad thing. I'll never agree with that!

Tom
ttwetmore 2012-05-14T23:13:30-07:00
Testuser42 said, "I can guess at how the output would look like after every step, before you have any of the later information. Maybe you could still show how the conclusion person would look after step 4 (if you concluded that this "Johann" was the "John" you're looking at, but still recording your low confidence) and then after step 5 (where you discard the last conclusion because of the new evidence. Maybe keep a note to remind you of the wrong lead, to follow up with more definite evidence)"

In my opinion the researcher can wait until there are many persona level evidence records before ever committing to join subsets of them into conclusion persons. From your comment I get the idea that you think I would create a conclusion person immediately after talking to Grandma, and that at each step along the way I would update that conclusion person by either linking to another persona or by removing a link to a persona that I now believe does not pertain to the conclusion person.

I believe that either approach, and every one in between is acceptable, and whichever is chosen would be based on both the details of the cases being evaluated and on the preferences of the users. What I would want to insist on is that the user have the flexibility to do it any way they think best.

I don't know at what stage in the research process described by the example I would have decided to first create the conclusion record for John Doe. I think I would have created it right at the beginning, because the first persona came from a direct descendent of the person of interest. That is, in the example case, you can be nearly 100% sure (except that Grandma could be senile and have completely forgotten everything about her grandfather) that the first persona record that you collected pertains to exactly the person you are interested in. Even though Grandma is not sure of the exact details of the vital events, she is sure of the existence of the person, and that is much more important than having dates and places.

Maybe I should try to do as you wonder, as I would indeed almost assuredly been updating the conclusion person as the facts unfolded.

Tom
louiskessler 2012-05-15T00:07:14-07:00

Testuser, et al.

I have posted on my blog what I propose Behold, with source-based data entry, would be like and how it would handle your six sources step-by-step:
http://www.beholdgenealogy.com/blog/?p=1094

That will help you see how the data is processed by Behold, but to me that doesn't have any relationship to how it would be exported. Behold would be able to export to BetterGEDCOM, GEDCOM X or even GEDCOM 5.5 (with a just few custom tags)

Tony: Not sure if my ideas are at all compatible with yours.

Tom: Our acknowledged disagreement is over personas. No need to rehash.

Louis
testuser42 2012-01-15T13:11:40-08:00
I'll try a simple example for the step-by-step idea:

1. Source: Conversation with Grandma
  • Person of interest: John Doe (her grandfather, my g-g-grandfather)
  • PFACTs:
  • - Death: between 1904 and 1906 in Ourtown, Alabama (she was not yet in school, and she started school in 1906)
  • - Birth: around 1830 in Histown, Connecticut (he was in his seventies)
  • Confidence: not so high, Grandma wasn't so sure herself

2. Source: Handwritten note by Jane Doe, daughter of John Doe
  • Person: John Doe
  • PFACTs:
  • - John Doe died of a heart attack on the 3rd of April 1905, in the front yard of his home in Ourtown, while carrying groceries he just brought from the store.
  • File: scan of the note.
  • Confidence: pretty good.

3. Source: Copy of death cert for John D Doe
  • - John D Doe died 1905-04-03 at 11 a.m. in Ourtown, Ala.
  • - aged 74
  • - Cause of death: Heartattack, confirmed by Dr Doc.
  • - Handwritten note: "Buried 3 days later, St. Michael's"
  • File: scanned document.
  • Confidence: looks official, but have to find out more about the burial.

4. Source: Birth cert of Johann Doe
  • Note: this is the only J. Doe to be found in Histown, Conn that was born in 1830 +-1
  • Person: Johann Doe
  • - Born 21st December 1830
  • - Parents Hans and Marie Doe
  • - in Histown, Conn
  • - baptised 25th Dec, catholic
  • File: scan
  • Confidence: The data is trustworthy, and it's a really good fit. Could be my John D Doe, but needs more proof!

5. Source: Birth cert of John Dorian Doe
  • Grandma found this in a drawer with other family documents.
  • Person: John Dorian Doe
  • - Born 3/3/1830
  • - to Stephen Doe and his wife Hilda, née Schmidt
  • - in Histown, Massachussetts
  • - he was their 2nd child
  • File: Scan
  • Confidence: quite high.
  • Note: Grandma says that she wasn't sure about the "Conn" anyhow.

6. Source: Death of Johann Doe
  • Article in the Histown (Conn) Paper, 8th Sept 1834
  • "A fire consumed Doe's smithy yesterday, one of the oldest buildings in town..."
  • "the blacksmith Hans Doe couldn't save his youngest son, Johann, age 3 and 3/4..."
  • Note: this is to show that Johann Doe of Histown Conn is not my John D Doe.
testuser42 2012-01-15T13:15:48-08:00
Are the above good for a starting test case?
If not, can we tweak them?
If they're OK, let's put them on the page and then see how the data would look in DeadEnds, Behold's model, STEMMA, etc.
ACProctor 2012-01-16T09:28:31-08:00
Thanks for starting this thread. We should have a stock of these examples in an easily found location, and associated implementations and discussion under each one.

Is this the best place though, given that discussions are not backed up? We don't have any hierarchical support on this wiki otherwise.

The phrase "catch 22" comes to mind :-(

Tony
testuser42 2012-02-18T07:27:36-08:00
I think we can use a regular page (which does back up) and just simulate a hierarchy with indentations. Kind of like the wikipedia discussion pages. I tried to do something like that for the List of main Citation Elements.
louiskessler 2012-02-18T08:46:52-08:00
That's a good idea. I could write up what I believe my evidence/conclusion modeling would look like.

But your examples above are all too simplistic. They are all simple pieces of evidence that can be handled similarly. Only the conclusions are a bit different which really only has to be a text string. So those examples don't show the failings in any system that disaggregates data.

You should add examples of a Census record which includes multiple generations and side-relatives of a family with piles of info in the one record.

Or how about something like a seating list at your grandfathers wedding which shows one interesting table of people sitting together that leads you to believe they may have been relatives of each other or had some other relationship (friends, coworkers).

How about a clipping about a town fire you didn't know about that destroyed your ancestors neighbor's homes and that makes you believe it affected their life causing them and possibly other relatives to move to a different city. But nothing specific about your ancestor is mentioned in the article.

Louis
testuser42 2012-05-13T12:28:18-07:00
Hi again...
Louis, sure, more complex examples are definitely needed, too.

The idea behind the list above (Jan 15) was that these simple examples would make it easier to see how the data is processed and exported. I'm thinking of a hypothetical program in which you enter the data of the first step. Then you export to "BG". Then you add more, export again. Repeat... A "step-by-step" example might show where the differences are between a n-tier model and a 1 or 2-tier model. The first step would probably result in pretty similar BG output. But when more evidence is added, the models start to differ. When conclusions are made, how do these look? And when a conclusion is unmade?

I had you and Tom in mind... How would Behold handle these simple cases, how would the exported BG look? And what would Deadends do? Or Tony's Stemma? Or the current GedcomX, if anyone is able to answer that already. These are the very basic building blocks, I think we should compare and understand them before going to the possibly harder details.
testuser42 2012-05-13T13:10:50-07:00
GEDCOM Test File
The developers at the German GEDCOM-L mailing list have started building an example GEDCOM file that is intended to show what they have agreed to. It will have every possible tag in Gedcom 5.5.1 and the (few) custom tags that they want to use. They want to include a number of special and difficult cases. This file is supposed to be used by the developers as a reference and as a test of their program's import and export capabilites.
It's not going to be full of dummy-persons: they are building a storyboard to develop a more realistic data set.

The wiki page is
http://wiki-de.genealogy.net/GEDCOM_Musterdatei
which has links to a number of to-do-lists and the current state of the example GEDCOM:
http://wiki-de.genealogy.net/GEDCOM_Musterdatei/Musterdatei
(right now, there's only this text on the wiki, no file yet to download)
The storyboard is at
http://wiki-de.genealogy.net/GEDCOM_Musterdatei/Storyboard
Andy_Hatchett 2012-05-13T15:14:55-07:00
Too bad they don't put the Google Translate widget on their website, it would certainly help those of us who are interested but are linguistically challenged.
louiskessler 2012-05-13T21:25:02-07:00
Nice find!
ACProctor 2012-05-14T02:39:54-07:00
I apologise for my ignorance here. What exactly is GEDCOM-L? I haven't come across it before.

Is it an attempt to enhance GEDCOM in a particular way?

Tony
theKiwi 2012-05-14T04:11:47-07:00
GEDCOM-L is a mailing list (L) about GEDCOM - in this instance it's a German one apparently. There is also an English one here https://listserv.nodak.edu/cgi-bin/wa.exe?A0=GEDCOM-L

Roger
testuser42 2012-05-14T04:40:38-07:00
Yes, it's a list where a number of German developers (I guess around 15-20 people) discuss how they can get their programs to understand each other. They went through all the GEDCOM 5.5.1 tags and explored how they should be used according to the standard. If there was any uncertainty, they developed a preferred solution and voted on it. They also agreed on a number of custom tags, like a level 0 _PLAC.
The tags that have been voted on are found here:
http://wiki-de.genealogy.net/Kategorie:GEDCOM-Tag-Fertig_abgestimmt

Andy, you can use http://translate.google.com/ and put the URL into the translation field. Then click the link that shows up in the field at the right. If the translation is too terrible, I can try and help out ;)
testuser42 2012-05-14T04:50:05-07:00
Just found this wiki page for an introduction to the list:
http://wiki-de.genealogy.net/GEDCOM-l
- they started 30. 11. 2009
- there are about 25 people involved.
Andy_Hatchett 2012-05-14T06:16:09-07:00
"Andy, you can use http://translate.google.com/ and put the URL into the translation field. Then click the link that shows up in the field at the right. If the translation is too terrible, I can try and help out ;) "

Thanks, I finally added the Translate widget to my Browser bar so now one copy/paste lets me translate any website.

Andy