BetterGedcom - BetterGEDCOM test suite

testuser42 2012-01-15T12:15:47-08:00

Simple Test Cases for starting out and comparing

In TestSuite01 - Test Data Format, Tony said: I imagined a fictitious family (or set thereof) that embraced all the real-world situations and vagaries that we want to support.
Given that we may make the cases public (e.g. to show vendors something they cannot handle at present), or produce real test-date later on that's based on the test-cases, then I think the names and d.o.b can be invented at least :-)
I think a good start is an informal list of all the things BG must support or handle properly. We can then expand on those to create associarted test cases. and I have a think balloon above my head with a picture of the Munsters posing for a family protrait !!
I don't know the Munsters that well, so I can't say if they'd be perfect as an example family ;)

But, for starters, I think it would be good to put down on this page the informal list Tony mentioned. It would probably be condensed from the Better GEDCOM Requirements Catalog.

Also, I think examples that follow the possible step-by-step real-life process would be helpful:
- First there's a small bit of data you want to put into your program, let's say something you wrote down during a conversation with your grandma. How would an exported BG look at that point?
- Then you do some research about the topic, and find new sources and extract more pieces of evidence, and add these. How does the exported BG look now? (Would it look different if you added data in a few sessions and not in one chunk?)
- Later, maybe you'll find that one of the pieces you added before is not about the person you are researching. So you get rid of these conections, in whatever way your software provides. How might the exported BG look after this operation?

...and so forth, following the developments as they might happen.

ACProctor 2012-05-15T08:35:13-07:00

Re: "Your example is very interesting and leaves me with a very positive opinion. The only significant difference between your model and mine seems to be the extra evidence layer, which acts to provide a "bag" of personas and places and the source information that describes where the evidence is from."

This is not a "model", though, Tom. I was merely trying to come up with a way of expressing test-cases that didn't presume a particular model or data-format. If the separation of evidence away from all reasoning and conclusion helps inspire a future direction for model development then that's good but it's not my main thought here. I've rushed a description of the units of evidence for the test-case we're discussing - I say "rushed" because I'm afraid I've left some of the inferences in the notes.

Louis - I apologise for mentioning that word [smiling here]. This notation was to be a semi-formal way of depicting just the evidence, separately from reasoning and conclusion, which obviously meant that each person that was referenced was merely an "evidence person". I think I should have left it at that since I don't want to confuse this notation with any existing data models or entity concepts. If a particular data-format has some concept of personae (...and there's a whole grey-scale of what that might entail) then there would be an easy way of generating them from this notation.

However, as I said, this shouldn't presume any particular data model or technique. In describing the "pure" evidence separately from the "thought parts" of the data, it was trying to find a completely neutral approach for describing a fundamental part of any data. I feel that we all get so focused or attached to specific models, formats, or entity concepts that it's quite refreshing to strip it all bare.

Evidence
    Source: Conversation with Grandma
    Confidence: not so high. Grandma wasn't so sure herself
    Person
        Name: John Doe
        Note: her grandfather, my g-g-grandfather
        Death: between 1904 and 1906 in Ourtown, Alabama. She was not yet in school, and she started school in 1906
        Birth: around 1830 in Histown, Connecticut. He was in his seventies
    End Person
End Evidence
 
 
Evidence
    Source: Handwritten note by Jane Doe, daughter of John Doe
    Note: High quality scan of note available
    Confidence: high
    Person
        Name: John Doe
        Death: John died of a heart attack on the 3rd of April 1905, in the front yard of his home in Ourtown, while carrying groceries he just brought from the store
    End Person
End Evidence
 
 
Evidence
    Source: Copy of death cert for John D Doe
    Confidence: looks official but have to find out more about the burial
    Person
        Name: John D Doe
        Death: 1905-04-03 at 11am in Ourtown, Alabama aged 74
        CauseOfDeath: Heart attack, confirmed by Dr Doctor
        Note: Handwritten note on death certificate saying "Buried 3 days later, St. Michael's"
    End Person
End Evidence
 
Evidence
    Source: Birth cert of Johann Doe
    Note: this is the only J. Doe to be found in Histown, Connecticut that was born in 1830 +-1
    Confidence: The data is trustworthy, and it's a really good fit. Could be my John D Doe, but needs more proof!
    Person
        Name: Johann Doe
        Birth: 21st December 1830
        Parents: Hans and Marie Doe
        PlaceofBirth: Histown, Connecticut
        Baptised: 25th Dec, Catholic
    End Person
End Evidence
 
 
Evidence
    Source: Birth cert of John Dorian Doe
    Note: Grandma found this in a drawer with other family documents
    Confidence: quite high
    Note: Grandma says that she wasn't sure about the "Connecticut" anyhow
    Person
        Name: John Dorian Doe
        Birth: 3/3/1830
        Parents: Stephen Doe and his wife Hilda, née Schmidt
        PlaceOfBirth: Histown, Massachusetts
        Note: he was their 2nd child
    End person
End Evidence
 
 
Evidence
    Source: Death of Johann Doe. Article in the Histown (Connecticut) Paper, 8th Sept 1834
    Text "A fire consumed Doe's smithy yesterday, one of the oldest buildings in town..."
    Text "the blacksmith Hans Doe couldn't save his youngest son, Johann, age 3 and 3/4..."
    Note: this is to show that Johann Doe of Histown Connecticut is not my John D Doe
 
    Person
        Name: Johann Doe
        Death: 7 September 1834 aged about 3 years, 9 months
        PlaceOfDeath: Histown, Connecticut
        Birth: 1830 or 1831, subtracting age from date-of-death
    End Person
End Evidence

Tony

ttwetmore 2012-05-15T11:29:18-07:00

Tony,

For me, every example I see or create has a data model behind it, either real or imaginary. So I would never say that it's not time to be thinking about models. I believe an important aspect of human intelligence is the constantly executing model-maker that runs in the background of our brains, always trying to find the patterns that make the world make sense . When I did my example, I had the DeadEnds model in mind, even though it is still mostly a figment of my imagination. My example is, of course, not a model, but for me, it is the expression of something that fits the model.

When I read your example I inferred certain things about your model, for example that your model will allow you to represent evidence as bags of person and place "records"/sub-structures with accompanying source material. When I read Louis's example I think I am seeing into his ideas about his model, where his bags are potentially even bigger, with substructures from what would be, for me, many items of evidence. You and I definitely have personas in our models. I also have multi-level personas up to conclusion persons. In your case I'm not yet sure about other person concepts. Louis has a single person concept, the conclusion person, and he seems to cut and paste and annotate information from his source records to build those conclusion persons. Louis's model of a source includes the information that is in the source, while yours and mine separates the sources from their contents. For me these are all aspects of the underlying models that are in our minds.

Tom

louiskessler 2012-05-15T11:43:14-07:00

Tom said: "Louis's model of a source includes the information that is in the source, while yours and mine separates the sources from their contents."

Tom. You said it perfectly. That's exactly the difference.

Louis

ttwetmore 2012-05-15T12:48:34-07:00

Louis,

What is interesting about these discussions is that we all know what the important information is, and we all record that important information. The differences come in how we design the "containers" to hold the information. From the 50,000 foot level it sometimes doesn't seem all that important what the containers are, as long as all the information exists and the important links between information of different types exist and are consistent.

Tom

ACProctor 2012-05-15T12:53:03-07:00

Well, I'm glad we got that sorted ;-)

Back to this test-case now...

What's everyones opinion on the idea of separating the evidence from the reasoning/conclusion in a test-case, and making it semi-formal like this?

I realise my examples here are very "fluid" but that could be tightened up a bit. There must be a wealth of potential torture cases and basic unit-level test cases, all of which we'd need to represent in a way that people can follow.

As well as seeing them represented in existing data models, we'd also need to check their representation in each refinement of a new data model.

Tom - I don't want to get bogged down in semantics but there is no real storage data model in this notation. It simply acknowledges that each source (or "unit") of evidence contains facts about a number of people and/or places.

Tony

Andy_Hatchett 2012-05-15T14:48:05-07:00

Glad to see discussion on this but...
Being a non-tech person my head is spinning following this and also trying to read Tony's Stemma write-up at:

http://www.parallaxview.co/familyhistorydata/

Ah well- at least it keeps me off the streets!

;)

louiskessler 2012-05-15T19:23:49-07:00

Tony:

Yes, the reasoning/conclusion and all subjective things should be completely separated from the source and source details which should only be facts without interpretation.

Also, I don't like your liberal use of the word "evidence". Evidence doesn't just exist on its own. Sources and source details exist on their own. Sources and source details become evidence ONLY when they are used to support (or counter) some reasoning/conclusion.

A birth certificate (source) states: "John Doe, born in Chicago in 1850" is a source detail. It is not evidence unto its own.

Once you have added the birth date and place to the information for your John Doe, then you have used the birth certificate as evidence about your John Doe's birth.

See the subtle use of evidence. It is like the linkage between the conclusion and the source.

Louis

ttwetmore 2012-05-15T19:33:40-07:00

Tony said, "Tom - I don't want to get bogged down in semantics but there is no real storage data model in this notation. It simply acknowledges that each source (or "unit") of evidence contains facts about a number of people and/or places."

No need to respond, but I don't know what a real storage data model is, nor do I understand how it bears on this discussion.

Tom

ttwetmore 2012-05-15T20:06:00-07:00

Tony, Louis,

I agree that reasoning and conclusions should be separated from sources. However, I believe that evidence extracted from sources can be subject to a little editorial patching. For example, if a record contains a blatant error, I believe the data as it exists in the evidence should be recorded, but the corrected information should also be included in the evidence record, maybe it should even be the main information, with the actual data subsumed into a note. You might argue that correcting a blatant error in evidence is the wrong thing to do, but I feel we must be driven by pragmatics and common sense as much as by anything.

I have seen arguments that the first layer of conclusions is just coming to an understanding of what the evidence actually means (e.g., whether the marks on the certificate that seem to say "New London, Conn", mean the actual city of New London, in New London County, in the State of Connecticut), before then using that evidence to make any conclusions about persons. Yeah, yeah, these are understandable arguments, but in the larger scheme I see them as silly, confusing to almost everyone, and of little to no practical use.

I understand the argument that facts aren't evidence until used in making a conclusion, but I call facts evidence immediately, and I don't seee any reason to worry about it. It doesn't change anything about how we record information, or how we construct a model. When I am researching someone, and I am collecting facts about every person with the same name who might potentially be that person, a purist would argue that my facts aren't evidence until I make an explicit decision from them. Okay, I agree, but I also ask, why does it matter? If it does it's too subtle for me to understand. I collect facts to make conclusions, and I am willing to go out on the limb and believe that I will be able to use the information for conclusions one way or another soon. And the facts don't even have to be about people that might be people I am currently interested in. I'm comfortable calling any fact gleaned from anywhere evidence, as I can recognize that every fact is always on the verge of its own incipient evidence-hood, and I'd rather have one word to describe all the facts I collect, rather than two.

Tom

ttwetmore 2012-05-15T20:08:53-07:00

Louis said, "See the subtle use of evidence. It is like the linkage between the conclusion and the source."

It's not subtle at all in my opinion, It is what evidence is.

louiskessler 2012-05-15T20:39:10-07:00

Tom,

What you say makes sense. I especially like your statement of "incipient evidence-hood" which is true.

Louis

ACProctor 2012-05-16T09:24:16-07:00

Re: "Also, I don't like your liberal use of the word 'evidence'"

Louis - let's just agree that it's an emotive word. In general discussions, I'm not totally happy with the word either, and have already written about it at E&C.

In the context of defining test cases, though, the word is more appropriate since this is the "evidence" that the reasoning & conclusion parts will be referring to. Unlike a real-life case, this notation need only present examples of useful data for the test case, i.e. data which will be referred to and which is likely to contribute to a conclusion in some positive or negative way.

Tom - The same can be said of "blatant errors in source records". As we're discussing test cases, and the representation of the data for test cases, we can assume that all data is an authentic transcription of some records (which may even be fictitious ones), except in the case of unit tests that specifically wants to test the way that such discrepancies [between what's written and what's read back] are handled.

Tony

testuser42 2012-05-13T12:46:04-07:00

At soc.genealogy.computing, Ian Goddard has a lengthy historical example of wrong conclusions that lead to a long rat tail of wrong conclusions.
Message-ID: <a0vl9vFn85U1@mid.individual.net>
"Recycling conclusions as evidence - a counter-example"
I'm not going to copy-and-paste the whole thing, but I was wondering if this could be another test case.

The gist of the case: Someone makes a wrong conclusion: X is a daughter of Y and Z. This is put down in books and others take it for the truth and connect their work to this (faulty) evidence.

I think there are a few questions here.
If people build their trees from the known to the unknown, the error of connecting to a wrong person should be easily undone. Cut off that branch, it's wrong. Everything else is still right.
How would this be reflected in a BG file, in 1, 2, n-tiers?

Also, how did we find out that there is an error? What are the steps to disprove the conclusion that we found in a book? How would a BG look before and after? Or in between, when we were collecting evidence that disagrees with the old conclusion?

ttwetmore 2012-05-13T13:58:22-07:00

Here is the DeadEnds version for the example evidence.

1. Source: Conversation with Grandma
Person of interest: John Doe (her grandfather, my g-g-grandfather)
PFACTs:
- Death: between 1904 and 1906 in Ourtown, Alabama (she was not yet in school, and she started school in 1906)
- Birth: around 1830 in Histown, Connecticut (he was in his seventies)
Confidence: not so high, Grandma wasn't so sure herself

person: {
id: aaaaa
name: John /Doe/
death: {
date: between 1904 and 1906 { note: He died before the source began school in 1906. }
place: Ourtown, Alabama, United States
}
birth: {
date: about 1830 { note: Source states John was in his seventies when he died. }
place: Histown, Connecticut, United States
};
source: {
type: conversation;
person: ... name of Grandma ... { note: She is a grandaughter of John Doe. }
quality: medium { note: Source is not sure of this information. }
}
}

2. Source: Handwritten note by Jane Doe, daughter of John Doe
Person: John Doe
PFACTs:
- John Doe died of a heart attack on the 3rd of April 1905, in the front yard of his home in Ourtown, while carrying groceries he just brought from the store.
File: scan of the note.
Confidence: pretty good.

person: {
id: bbbbb
name: John /Doe/
death: {
date: 3 April 1905
place: Ourtown, Alabama, United States
cause: heart attack
note: He died in the front yard of his home while carrying groceries.
}
source: {
type: note
author: Jane Doe { note: Source is a daughter of John Doe. }
quality: high
media: ... URI or file path of the scanned note ...
}
}

3. Source: Copy of death cert for John D Doe
- John D Doe died 1905-04-03 at 11 a.m. in Ourtown, Ala.
- aged 74
- Cause of death: Heartattack, confirmed by Dr Doc.
- Handwritten note: "Buried 3 days later, St. Michael's"
File: scanned document.
Confidence: looks official, but have to find out more about the burial.

person: {
id: ccccc
name: John D /Doe/
death: {
name: John D Doe
age: 74 years
date: 3 April 1905
time: 11:00 am
place: Ourtown, Alabama, United States
cause: Heart attack { note: Cause confirmed by doctor on the death certificate. }
}
burial: {
date: 6 April 1905
cemetery: Saint Michael's
note: This information comes from a hand written note on the death certificate.
}
source: {
type: death certificate
... other properties if avaiable to identify source of the certificate ...
media: ... URI or file path to scanned copy ...
quality: high
}
}

4. Source: Birth cert of Johann Doe
Note: this is the only J. Doe to be found in Histown, Conn that was born in 1830 +-1
Person: Johann Doe
- Born 21st December 1830
- Parents Hans and Marie Doe
- in Histown, Conn
- baptised 25th Dec, catholic
File: scan
Confidence: The data is trustworthy, and it's a really good fit. Could be my John D Doe, but needs more proof!

person: {
id: ddddd
name: Johann /Doe/
born: {
date: 21 December 1830
place: Histown, Connecticut, United States
}
father: { name: Hans /Doe/ }
mother: { name: Marie }
baptism: {
date: 25 December 1830
denomination: Catholic
}
note: This is the only J. Doe that was found to be born in Histown, Connecticut, between 1829 and 1831.
source: {
type: birth certificate
media: ... URI or filepath of sanned copy ...
quality: high
}
}

5. Source: Birth cert of John Dorian Doe
Grandma found this in a drawer with other family documents.
Person: John Dorian Doe
- Born 3/3/1830
- to Stephen Doe and his wife Hilda, née Schmidt
- in Histown, Massachussetts
- he was their 2nd child
File: Scan
Confidence: quite high.
Note: Grandma says that she wasn't sure about the "Conn" anyhow.

person: {
id: eeeee
name: John Dorian /Doe/
birth: {
date: 3 March 1930
place: Histown, Massachusetts, United States
}
father: { name: Stephen /Doe/ }
mother: { name: Hilda /Schmidt/ }
note: He was the second child in his family.
source: {
type: birth certificate
repository: Grandma's house
quaility: high
media: ... URI or filepath ...
note: Gradma was sure about Connecticut.
}
}

6. Source: Death of Johann Doe
Article in the Histown (Conn) Paper, 8th Sept 1834
"A fire consumed Doe's smithy yesterday, one of the oldest buildings in town..."
"the blacksmith Hans Doe couldn't save his youngest son, Johann, age 3 and 3/4..."
Note: this is to show that Johann Doe of Histown Conn is not my John D Doe.

person: {
id: fffff
name: Johann /Doe/
death: {
name: Johann /Doe/
age: about 3 years, 9 months
date: 7 September 1834
place: Histown, Connecticut, United States
}
birth: {
date: 1830 or 1831
note: Date computed from age at death.
}
father: { Hans /Doe/ }
source: {
type: newspaper article
text: A fire consumed Doe's smithy yesterday, one of the oldest buildings in town..." "the blacksmith Hans Doe couldn't save his youngest son, Johann, age 3 and 3/4..."
quality: high
}
note: This proves that Johann Doe of Histown, Connecticut, is not the John Doe of Histown, Connecticut.
}

After creating these six evidence level persona records we would create a conclusion person that refers to (links to using ids) the personas that we conclude refer to the John Doe of interest. It would be the researcher's choice as to whether they'd get rid of the two personas for Johann Doe. Software would refer to the four personas to retrieve other data about John Doe. Note that I put his name and birth and death info in the conclusion record. This is done in order to choose which information from the evidence personas should be used to summarize the person at the conclusion level.

person: {
id: zzzzz
name: John Dorian /Doe/
birth: {
date: 3 March 1930
place: Histown, Massachusetts, United States
}
death: {
date: 3 April 1905
place: Ourtown, Alabama, United States
}
person: { id: aaaaa }
person: { id: bbbbb }
person: { id: ccccc }
person: { id: ddddd }
source: {
type: conclusion;
proofStatement: ... Proof statement that explains why these four personas are the ones that refer to the John Doe of interest ...
}
}

At this point I would create two persona records for the parents, and two person records for the parents. There would also be a conclusion level family record to link the three conclusion level person records, basically just like GEDCOM.

ttwetmore 2012-05-13T13:58:53-07:00

Sorry the indentation got all messed up.

ACProctor 2012-05-14T02:29:29-07:00

That example of Ian's is interesting, but the narrative style takes a few goes to read and fully assimilate [well, it did for me].

I agree with Klemens [testuser42] that it would make a good test case. What's the best way of representing its essence though? If you could have a program that would crank-out a Dead Ends or GedcomX version then you'd need a formal representation of the case that it could read as input Even then, there might be variations of how a format might represent the same test case.

Given the different approaches to handling E&C (and the intermediate "reasoning"), I would suggest we could start with a generic notation for just representing the evidence (i.e. the transcribed PFACTs, citations, etc) with no interpretation.

All the reasoning and conclusions could be represented as simple narrative text that refers back to the evidence notation.

It feels like that could give us some flexibility for translating the case into specific models & data-formats, some of which are more different than others. I still think that has to be a manual process though.

Tony

testuser42 2012-05-14T04:23:18-07:00

Tom, thank you for your DeadEnds version! If you want the indentation to work, I think you need to use [[code]] tags (when using these you need to add manual line breaks for long lines).

I see you are putting the Source inside the Person. I guess this is an alternate way, as long as the source really is only about one person. It definitely is "smooth" for a human looking at the file... But might this be a little inconvenient for the programmer?

I can guess at how theoutput would look like after every step, before you have any of the later information. Maybe you could still show how the conclusion person would look after step 4 (if you concluded that this "Johann" was the "John" you're looking at, but still recording your low confidence) and then after step 5 (where you discard the last conclusion because of the new evidence. Maybe keep a note to remind you of the wrong lead, to follow up with more definite evidence)

testuser42 2012-05-14T04:27:54-07:00

Tony, I agree, it is not easy to work through narrative examples. A simple notation would be very helpful. Some kind of Tag-Value system should be sufficient? I tried that with the list above, but even there I did not stay consistent...

ACProctor 2012-05-14T05:23:24-07:00

I'm trying to imagine a way of writing-up a test case that formalises the representation of evidence (to some degree) but not the reasoning or conclusions related to it. It's an interesting exercise to try and banish all thoughts of existing models or data-formats and just represent what's there, and then talk about it in a free-form way. It's hard to not try and create mental data structures too early

That would mean we'd need to forget about any tree representation, forget about FAM records and offspring linkages, forget about the mechanics of citation-elements/sources, forget about Event entities, etc.

This is totally off-the-cuff so I may change my mind before lunch but imagine something like a unit of evidence...

Evidence tag
Source description-of-where-it-came from

Person tag
PFACTs ...
notes ...
End Person
Place tag
PFACTs ...
notes ...
End Place
End Evidence

The source description could be as formal or informal as necessary as long as all the information is there. Each bit of evidence says something about people or places, but - since it's only evidence - we shouldn't rush to create real Persons, or join them to similarly-named people in other units of evidence. In effect, the Person/End-Person is an evidence-person (or persona), and the Place/End-Place is a similar concept for the place.

Note that this is only a written notation and shouldn't be mistaken for a data-format, or even a model.

The tags would be some unique names that could be referenced in some narrative analysis of the evidence.

I know I probably need to create an example to explain this more clearly. I think Tom's Dead Ends data could be easily factored into this style

What do you think folks?

Tony

ACProctor 2012-05-14T08:37:05-07:00

Sorry, I messed up my indentation too. I'll have another go..

Evidence tag
    Source: description-of-where-it-came-from
 
    Person tag
        PFACTs ...
        notes ...
    End Person
    ...more persons mentioned...
 
    Place tag
        PFACTs ...
        notes ...
    End Place
    ...more places mentioned...
End Evidence

Here's a simple example of a birth registration. It also presents global properties, i.e. ones not specifically related to a single person or place:

Evidence birthRegMKirk
    Source: certificate of civil birth registration [1840/Q1/14/387]
    Informant: pWilliam
    DateOfReg: 1840-03-28
 
    Person pMelicent
        Name: Melicent Kirk
        DateOfBirth: 1840-03-17
        PlaceOfBirth: plGrantham
        Sex: girl
        Father: pWilliam
        Mother: pHelen
    End Person
 
    Person pWilliam
        Name: William Kirk
        Occupation: Labourer
        Residence: plGrantham
    End Person
 
    Person pHellen
        Name: Hellen Kirk
        MaidenName: Hill
    End Person
 
    Place plGrantham
        Address: Grantham, Lincolnshire
    End Place
End Evidence

If there are several units of evidence - as in the example of Ian Goddard - then the reasoning and conclusion narrative can use the tags such as pWilliam to point to the relevant data without cluttering up the case being made.

I imagined that narrative could be in the form of short paragraphs, or even setences, each making a distinct point.

Tony

ttwetmore 2012-05-14T10:05:46-07:00

TestUser said, "I see you are putting the Source inside the Person. I guess this is an alternate way, as long as the source really is only about one person. It definitely is "smooth" for a human looking at the file... But might this be a little inconvenient for the programmer?"

I think it is important to deal with two kinds of sources. The "one-of's" like a note from grandmother, and a big one, like a book or a register.

In DeadEnds I want the user to be able to either have the source be self-contained information at the place where the source reference is required, or have the source reference point off to a separate, self-contained source record. DeadEnds allows both approaches. In this case, for simplicity, I choose to keep the sources self contained in the persona records. I could have done the example with external source records instead.

Tom

ttwetmore 2012-05-14T22:58:09-07:00

Tony,

Your example is very interesting and leaves me with a very positive opinion. The only significant difference between your model and mine seems to be the extra evidence layer, which acts to provide a "bag" of personas and places and the source information that describes where the evidence is from.

If there are not going to be independent source records then your solution is more efficient because the personas and the places in the evidence bags don't need explicit references to the source information since they get it all for free by simply being in their bag.

If there are only going to independent source records they my solution is more efficient, especially when many persona are extracted from the the same sources.

If we support both independent source records and source information embedded in personas and places (or in evidence bags as you do it), which is the DeadEnds approach, then how we trade off between those two types of sources will determine which solution is more efficient.

I think your solution is close to Louis's preferences. Both our solutions have full bodied personas which is absolutely critical in my opinion.

In the DeadEnds model I also allow places to be independent records or self-contained structures within other records. In my example I made all the places internal substructures. In your example you made all places independent records. (I'm defining "independent record" to mean anything that needs to have an id so it can be referred to from an external place.) I could have made Ourtown, Alabama, and Histown, Massachusetts, two independent place records, but to limit the number of records in the example I choose not too.

From this you can see that I have no qualms with allowing the same information to be represented in more than one way depending upon user preference. There was a Better GEDCOM requirement that said this was a bad thing. I'll never agree with that!

Tom

ttwetmore 2012-05-14T23:13:30-07:00

Testuser42 said, "I can guess at how the output would look like after every step, before you have any of the later information. Maybe you could still show how the conclusion person would look after step 4 (if you concluded that this "Johann" was the "John" you're looking at, but still recording your low confidence) and then after step 5 (where you discard the last conclusion because of the new evidence. Maybe keep a note to remind you of the wrong lead, to follow up with more definite evidence)"

In my opinion the researcher can wait until there are many persona level evidence records before ever committing to join subsets of them into conclusion persons. From your comment I get the idea that you think I would create a conclusion person immediately after talking to Grandma, and that at each step along the way I would update that conclusion person by either linking to another persona or by removing a link to a persona that I now believe does not pertain to the conclusion person.

I believe that either approach, and every one in between is acceptable, and whichever is chosen would be based on both the details of the cases being evaluated and on the preferences of the users. What I would want to insist on is that the user have the flexibility to do it any way they think best.

I don't know at what stage in the research process described by the example I would have decided to first create the conclusion record for John Doe. I think I would have created it right at the beginning, because the first persona came from a direct descendent of the person of interest. That is, in the example case, you can be nearly 100% sure (except that Grandma could be senile and have completely forgotten everything about her grandfather) that the first persona record that you collected pertains to exactly the person you are interested in. Even though Grandma is not sure of the exact details of the vital events, she is sure of the existence of the person, and that is much more important than having dates and places.

Maybe I should try to do as you wonder, as I would indeed almost assuredly been updating the conclusion person as the facts unfolded.

Tom

louiskessler 2012-05-15T00:07:14-07:00

Testuser, et al.

I have posted on my blog what I propose Behold, with source-based data entry, would be like and how it would handle your six sources step-by-step:
http://www.beholdgenealogy.com/blog/?p=1094

That will help you see how the data is processed by Behold, but to me that doesn't have any relationship to how it would be exported. Behold would be able to export to BetterGEDCOM, GEDCOM X or even GEDCOM 5.5 (with a just few custom tags)

Tony: Not sure if my ideas are at all compatible with yours.

Tom: Our acknowledged disagreement is over personas. No need to rehash.

Louis

testuser42 2012-01-15T13:11:40-08:00

I'll try a simple example for the step-by-step idea:

1. Source: Conversation with Grandma

Person of interest: John Doe (her grandfather, my g-g-grandfather)
PFACTs:
- Death: between 1904 and 1906 in Ourtown, Alabama (she was not yet in school, and she started school in 1906)
- Birth: around 1830 in Histown, Connecticut (he was in his seventies)
Confidence: not so high, Grandma wasn't so sure herself

2. Source: Handwritten note by Jane Doe, daughter of John Doe

Person: John Doe
PFACTs:
- John Doe died of a heart attack on the 3rd of April 1905, in the front yard of his home in Ourtown, while carrying groceries he just brought from the store.
File: scan of the note.
Confidence: pretty good.

3. Source: Copy of death cert for John D Doe

- John D Doe died 1905-04-03 at 11 a.m. in Ourtown, Ala.
- aged 74
- Cause of death: Heartattack, confirmed by Dr Doc.
- Handwritten note: "Buried 3 days later, St. Michael's"
File: scanned document.
Confidence: looks official, but have to find out more about the burial.

4. Source: Birth cert of Johann Doe

Note: this is the only J. Doe to be found in Histown, Conn that was born in 1830 +-1
Person: Johann Doe
- Born 21st December 1830
- Parents Hans and Marie Doe
- in Histown, Conn
- baptised 25th Dec, catholic
File: scan
Confidence: The data is trustworthy, and it's a really good fit. Could be my John D Doe, but needs more proof!

5. Source: Birth cert of John Dorian Doe

Grandma found this in a drawer with other family documents.
Person: John Dorian Doe
- Born 3/3/1830
- to Stephen Doe and his wife Hilda, née Schmidt
- in Histown, Massachussetts
- he was their 2nd child
File: Scan
Confidence: quite high.
Note: Grandma says that she wasn't sure about the "Conn" anyhow.

6. Source: Death of Johann Doe

Article in the Histown (Conn) Paper, 8th Sept 1834
"A fire consumed Doe's smithy yesterday, one of the oldest buildings in town..."
"the blacksmith Hans Doe couldn't save his youngest son, Johann, age 3 and 3/4..."
Note: this is to show that Johann Doe of Histown Conn is not my John D Doe.

testuser42 2012-01-15T13:15:48-08:00

Are the above good for a starting test case?
If not, can we tweak them?
If they're OK, let's put them on the page and then see how the data would look in DeadEnds, Behold's model, STEMMA, etc.

ACProctor 2012-01-16T09:28:31-08:00

Thanks for starting this thread. We should have a stock of these examples in an easily found location, and associated implementations and discussion under each one.

Is this the best place though, given that discussions are not backed up? We don't have any hierarchical support on this wiki otherwise.

The phrase "catch 22" comes to mind :-(

Tony

testuser42 2012-02-18T07:27:36-08:00

I think we can use a regular page (which does back up) and just simulate a hierarchy with indentations. Kind of like the wikipedia discussion pages. I tried to do something like that for the List of main Citation Elements.

louiskessler 2012-02-18T08:46:52-08:00

That's a good idea. I could write up what I believe my evidence/conclusion modeling would look like.

But your examples above are all too simplistic. They are all simple pieces of evidence that can be handled similarly. Only the conclusions are a bit different which really only has to be a text string. So those examples don't show the failings in any system that disaggregates data.

You should add examples of a Census record which includes multiple generations and side-relatives of a family with piles of info in the one record.

Or how about something like a seating list at your grandfathers wedding which shows one interesting table of people sitting together that leads you to believe they may have been relatives of each other or had some other relationship (friends, coworkers).

How about a clipping about a town fire you didn't know about that destroyed your ancestors neighbor's homes and that makes you believe it affected their life causing them and possibly other relatives to move to a different city. But nothing specific about your ancestor is mentioned in the article.

Louis

testuser42 2012-05-13T12:28:18-07:00

Hi again...
Louis, sure, more complex examples are definitely needed, too.

The idea behind the list above (Jan 15) was that these simple examples would make it easier to see how the data is processed and exported. I'm thinking of a hypothetical program in which you enter the data of the first step. Then you export to "BG". Then you add more, export again. Repeat... A "step-by-step" example might show where the differences are between a n-tier model and a 1 or 2-tier model. The first step would probably result in pretty similar BG output. But when more evidence is added, the models start to differ. When conclusions are made, how do these look? And when a conclusion is unmade?

I had you and Tom in mind... How would Behold handle these simple cases, how would the exported BG look? And what would Deadends do? Or Tony's Stemma? Or the current GedcomX, if anyone is able to answer that already. These are the very basic building blocks, I think we should compare and understand them before going to the possibly harder details.

Home > Sandbox > Test Suite

What should be in a BetterGEDCOM Testsuite? (please edit / add your ideas!)

Test Cases

Discussions

Comments