Adrian Bruce's Data Model Workspace - Evidence and Conclusions
- Elizabeth Shown Mills refers to a process of Analysis that has input Evidence and output Conclusions;
- The “evidence and conclusion” mode of working enables the permanent and separate recording of the input Evidence to an Analysis, and output Conclusions from that Analysis;
- The “evidence and conclusion” mode of working is entirely optional in the BetterGEDCOM data model;
- In the “evidence and conclusion” mode of working, we propose that input Evidence and outputs Conclusions are recorded on separate Person records;
- If a person record is used as input to an Analysis, it is referred to as an “Evidence Person”;
- If a person record is used to record output from an Analysis, it is referred to as a “Conclusion Person”;
- A “Conclusion Person” can then be used as an input to the next Analysis and from then on is therefore both a “Conclusion Person” and an “Evidence Person”;
- When repeated Analyses give rise to a chain or tree of Person stacked on Person, the current working hypothesis is the Person at the top of the chain.
The vast majority of family historians hold one record in their data files per person. As new sources come in, we mentally select data from those sources, analyse it, and if we conclude that the new information belongs to one of our existing people, we enter the details of the source into a new Source record, add the new information to our existing person, and, if we’re good, enter a citation and we also record some notes about our analysis somewhere.
The records for this person on our file therefore contain all the current conclusions for this person. Hence we can refer to this as a “conclusion person”. But don’t get hung up on the finality of the word “conclusion” – it’s really a “current working hypothesis” for the person that we have in our files. But Elizabeth Shown Mills uses the term “conclusion”.
Suppose I choose the wrong 1871 census record for my 4-greats grandmother, and then, after some time realise the error of my ways. How do I decide what to delete from my file? Obviously the 1871 census event goes. But will I remember that I’ve prematurely killed off my 4-greats grandfather because I saw that the erroneous person was a widow? Or is there other evidence that does kill him off? And what about the 1881 census for 4-greats grandmother? Is that one really her? Or is it the 1871 erroneous person again that I identified using data collected from that incorrect 1871 census record?
It would be nice to select the 1871 census record and follow through the consequences of getting it wrong – where was the 1871 record itself used? Not only that but where have we used any
information that depended on that 1871 record? Did our argument for saying the 1881 census was her, depend on the 1871 record or not?
The problem is that if we have all the data for 4-greats grandmother held in one place, against one person, it gets difficult to disentangle everything. But, if we had the data in some sort of a tree (say), we would find the problem census record and follow it up the tree, seeing where it was used, not just directly, but indirectly as well.
It is the aim of the “evidence and conclusion” idea to allow someone to construct just such a tree, to allow them to trace what might be called into question if we find an error.
Many of us would like the BetterGEDCOM data model, and the corresponding physical file formats, to be able to accommodate the “evidence and conclusion” ideas.
Q: “If I use an application program that supports BetterGEDCOM, do I have to use evidence and conclusion ideas?”
A: Absolutely not. As I said above, the vast majority of family history files don’t use them now, and we’ve got to be able to convert those files into the new BetterGEDCOM file formats for import, export or storage. We can’t tell everyone to re-enter all their data in “evidence and conclusion” mode. Also, the majority of family historians just aren’t going to be interested in working like this, they’re content to just enter all their conclusions (working hypotheses) against one person record.
So the BetterGEDCOM data model and the corresponding physical file formats must be able to accommodate working in both “evidence and conclusion” mode and the old-fashioned way – let’s call it “conclusion only” mode.
More formally, the BetterGEDCOM data model must allow
people to distinguish between evidence and conclusions but must not
mandate them working in that mode.
I understand that Elizabeth Shown Mills writes "Sources provide Information from which we select Evidence
for Analysis. A sound conclusion
may then be considered Proof" (from "Evidence Analysis: A Research Process Map” in “Evidence Explained”) (my italics).
We all of us do something like this – perhaps some more successfully than others. There are lots of possible processes that might describe how we work – some more helpful than others. I’ll use this process description because it helps to explain working in “evidence and conclusion” mode.
Like any process, this has sub-processes (“select [Evidence
for Analysis]” and “Analysis”) and it has inputs and outputs. Let’s look at the “Analysis” sub-process. Its inputs are the Evidence that we’re going to analyse and its outputs are the Conclusions from that Analysis. When we are working in “evidence and conclusion” mode, we want to permanently record the Evidence input to the Analysis sub-process and the Conclusions output from that Analysis, keeping them clearly separate. And we need to record the logic of the Analysis as well.
Evidence is input to analysis, Conclusions are output from analysis
How do we record our input Evidence and our output Conclusions in the BetterGEDCOM files? Or other files equivalent to BetterGEDCOM?
Bearing in mind that we want to keep input Evidence and output Conclusions permanently separate, the proposal is this:
If the input Evidence is held in (say) one or more Person records, then the output Conclusions are entered into a completely new Person record
that contains all the accepted data from the input Persons and excludes any data from the input Persons that our analysis has rejected.
If an application uses BetterGEDCOM as its native file format, then the proposal describes how the application program stores its data.
If an application only uses BetterGEDCOM as a means to import and export data as a means of data transfer, then it must be able
to read or write a BetterGEDCOM file that contains data with input Evidence and output Conclusions permanently separated on different Person records as per the proposal. Hence, it will probably
do something similar on its own files.
- John Smith is first found as a 5y old, in the 1880 census, living in New York. The information relating to John is manually extracted from the source, and a person entered into the application’s file containing only that data from 1880;
- A baptism for a John Smith is found in New York in 1875. Is this the same John? We don’t know yet. The information relating to this John is extracted from the source, and another person entered into the file containing only that data;
- We then mentally select the information from those two records, because we believe that they apply to the same human being;
- We do some analysis on the selected evidence and come to the conclusion (or working hypothesis) that, because of something, these two records refer to the same human. (Not sure yet where this analysis should be recorded);
- The two persons mentioned above have provided input Evidence to an Analysis, and are therefore referred to as “Evidence Persons”;
- Using the application, we then manually create a third person named John Smith where we record the details from both the baptism and census that our analysis has concluded are acceptably correct and belong to the same person. This third John Smith records the output Conclusions from an Analysis process, and is therefore referred to as a “Conclusion Person”. Again, feel free to regard this as a “Current Working Hypothesis Person” if you feel that Conclusions is too final a term but note that ESM uses the word “conclusion”;
- As part of the creation of the output “Conclusion Person”, the application must mark up the input “Evidence persons” so that they both point to the output “Conclusion Person”;
If we then find the 1900 census for our John Smith, we go through a similar Analysis process, with its own input “Evidence Persons” and its own output “Evidence person”. One input “Evidence Person” will contain only the data from the 1900 census. The other input “Evidence Person” will be the current working hypothesis for John Smith, the one that contains details from the 1875 baptism and
the 1880 census. Assuming we can show the new census belongs to the same John Smith, then our new
output “Evidence person” will be another new person record containing details from the 1875 baptism, the 1880 and 1900 census.
We now have a tree of John Smith persons.
The John Smith record containing just details from the 1875 baptism and the 1880 census is now an output “Evidence person” from the first Analysis and, at the same time, an input “Evidence Person” to the second Analysis. Because the 1875 / 1880 John Smith now points to the 1875 / 1880 / 1900 John Smith, we know that the 1875 / 1880 John Smith is no longer the current
working hypothesis is simply the one at the top of the tree. The current
working hypothesis person does not point to anyone else – that is how we can distinguish that this person is the current
working hypothesis person.
And only one layer
Immediately after John Smith’s 1880 data has been entered into the file, there is only person record in the file for him. At this stage there has been no Analysis, so is this person an evidence person or a conclusion person? It is proposed that:
This record does not point to any other person and is therefore the current
working hypothesis person. Since a current working hypothesis person is usually a conclusion person, we can decide to call this person a conclusion person.
Also, since the person contains only evidence from sources, we can decide to call this person an evidence person.