> Defining E&C for BetterGEDCOM
<< Under Construction (Brett & GJ 2012-03-05) >>
As agreed in the November 7 2011 Developer meeting I am posting a link to the introductory discussion topic on a document I wrote May 20 2011 about Evidence and Conclusion. Geir's Working document
The document describes a way to merge a model for Multilevel Evidence&Conclusion persons with a model that keeps Evidence and Conclusions separate, but linked via Citations. The document also considders backwards compatabiity with Gedcom, with Personas in the Gentech model, and interworking between programs that have implemented different parts of the model. There have been extensive discussion of the document (see the discussion tab of this page), but the document has not been updated to reflect those discussions (I have an updated version refelcting some of the discussions.)
The original content of this page follows.
The 11 April 2011 Developers Meeting
was spent discussing Tom Wetmore's concern that BetterGEDCOM was progressing on various fronts but hadn't made a commitment to E&C. Some of Tom's views were earlier expressed in his posting "The Chasm"; see specifically his posting here
At the time Tom's concern was expressed, BetterGEDCOM had been working to develop requirement related to EE & GPS Support
part of the wiki, somewhat on the heels of the requirements that had been developed for Administrative Research
By the conclusion of the 11 April 2011 meeting, it was thought progress toward other requirements should be suspended temporarily while the team focused on E&C with an open mind.
"Defining E&C for BetterGEDCOM" is a page to further the discussions that were held during the meeting, to determine how this effort should be organized and conducted in order that a definition can be made and a decision reached, to catalog links to some discussions about E&C on the Wiki and to record progress we make toward that the definition and decision.
Brief working description of E&C Model (concept)
Week 1 - Progress
Discussion: What will it take to get BetterGEDCOM E&C defined?
There is a need to summarize the topic discussed for each link.
A developing resource. (Awaiting Tom's additions)
- DeadEnds XML Model Tom's initial document on E&C. The model contains the data structures (not only for E&C), but almost nothing about rules for how these records shall be used.
- Discussion of the DeadEnds model (Some of these should be carried forward as separate links, and the topic summarized.)
- BetterGEDCOM Requirements Catalog, "Evidence 01 Evidence & Conclusion Model" and related discussion
- Adrian's "Research Process, Evidence & GPS" (and related discussions)
- BetterGEDCOM Comparisons discussion "Question re..." [ignore the title and read pages ?3 & 4]
- BetterGEDCOM's, "Evidence and Conclusion Process and why it is important to BetterGEDCOM."
- Mike's "BetterGEDCOM Attempt" (and related discussions)
- Build a BetterGEDCOM Blog, "GenTech"
- Build a BetterGEDCOM Blog, "How do scholarly genealogists approach the evidence process?'
- Possibly the original discussion about E&C. "Automatic Combination of Genealogical Records" (10 Nov 2010; BetterGEDCOM Home discussion).
- The Chasm Based on certain of Tom's assumptions, posting suggests E&C modeled practices are superior to other practices and even other best practices, based on an article in the Ancestry Insider blog, touches Gentec/NFS, the research process and where to store evidence.
- Brianjd (BetterGEDCOM) Evidence and Confidence (3 Dec 2010)
- BetterGEDCOM's Direct Model Support for the Evidence and Conclusion Process (11 Mar 2011)
- Testuser42's graphic comparing a Multi-Level- with a 2-Level- with a Single-Level Model: multilevel2.pdf
and the older comparison of Multi- to Single-Level Models: multilevel.pdf
The new graphic as image (remove if it's in the way):
- Another graphic by testuser42, displaying a Multi-Level-Model example with Events: multi-events.pdf
and again as PNG:
- Thoughts about undoing a conclusion: remove-undo-permutations.pdf and remove-undo-steps.pdf
Have we made any progress this first week?
(pulling this discussion over from the Developer's meeting page)
You wrote, "allay your concerns."
As much as it might appear otherwise (I just hate being logged out), my concern is less about specifics, and more progress developing the materials by which some objective review can be made to support decision making about BetterGEDCOM E&C.
The comments in the opening to the discussion and on the related wiki page. See also Geir's comments:
Pushed for time.
PS So many fine point have been made in your postings and all the others in this thread. I hope later this week to be less pressed for time and in a better position to respond to each.
P.S. I raised concepts of "evidence persons" and "snippets/logic and reasoning" thinking we might consider a topical outline as an approach to developing the materials referred to just above.
I can think of a other approaches too.
There is a decision making process that will work for E&C--let's make a plan and get 'er done. --GJ
Objectively, what decision? Or, what review?
We're creating a requirements catalogue here. Decisions about the contents of requirements catalogues are made under several circumstances:
- when requirements contradict;
- when requirements fail a cost / benefit test;
- when requirements work against the strategic or tactical direction of the company, possibly as laid down in standards or legislation;
- when requirements cannot be developed to a meaningful conclusion;
And probably some more besides.
Hey - maybe this is the list you need?
If I run the E&C requirements against those criteria:
- when requirements contradict - I know of no contradiction, PROVIDING nothing goes into the E&C model that mandates the use of things like personas / evidence persons / etc. I have seen nothing like that yet - the E&C Model contains the C-only Model;
- when requirements fail a cost / benefit test - not relevant here;
- when requirements work against the strategic or tactical direction of the company, possibly as laid down in standards or legislation. The equivalent here could be the contradiction of the BCG standards - again PROVIDING the data model does not mandate specific ways of working, I know of no way that it works against them;
- when requirements cannot be developed to a meaningful conclusion - my belief, as an IT specialist (ex-specialist?) is that the E&C MODEL is, in essence done. Questions like "Where do the proof arguments go?" (very important questions) actually apply across the board and not just to E&C;
There is, and Louis may not care for me bringing this up, one requirement that might cause us issues, and that is
"Syntax06 - Define one way of doing a thing"
"BetterGEDCOM should define just one way of doing one thing"
It might be thought that having a BG data model that accommodates both Evidence & Conclusion Models AND Conclusion-only Models is in contradiction of Requirement Syntax06. I believe this is not the case. The 2 are not in themselves in contradiction since one lives inside the other. Nobody said anything about using BetterGEDCOM in only one way - simply that if something from one of those methods is entered into a BG-compatible database, then there should be only one way of recording the data item. One place to enter a citation. One place to enter a note. And so one place to enter a proof argument if it's a note, and another one place to enter it if it's a citation.
(Louis may not be convinced. Heck - I'm not sure I'm convinced. All I know is that we have to accommodate all sorts of genealogists)
I thought the original concern was about the amount of progress that had been made on E&C, not about the validity of the E&C requirement. If it is the latter, then maybe the phrases above will help you to assess the validity.
As far as progress goes, my advice to you all as an IT (ex-)professional is that the E&C model AS SUCH doesn't require much work on it other than some more words describing what happens when data is added or modified or deleted. Have we covered off all the circumstances?
There are other areas, such as the recording of research plans and logs, proof arguments, etc, and the integration of those with events, attributes and relationships in the rest of the database that are further behind. (Yeah, that's me mentioning "proof" again).
In summary, yes, I'd like to make a plan - but I'm not sure what this plan is supposed to do in relation to specific objectives.
In the last Developers Meeting we discussed the proposal that all other work on the Wiki (Requirements Catalog and EE/GPS Support, etc.) should stop until a decision had been made on E&C.
Exactly what form that decision was to take (my term) wasn't decided. Nor did we decide how to organize the effort about how to advance beyond sporadic discussions (my term) toward a decision.
I won't say this as well as someone else might, but at the most practical level, it seemed decisions on E&C could have ramifications on other requirements. Likewise, progressing too far on other requirements might impinge on decisions about E&C.
So, we stopped the other development discussions that had been progressing--but the idea was we wouldn't be stopped forever.
Ala, my comment above, let's make a plan.
I know there are other points in your post. I'll try to connect after the meeting.
Humm... waiting for meeting to begin.
I did not read the last part of this discussion before Tom left.
As I stated in the meeting, I don't think it was proper to start a discussion about IF E&C should be in BG, the question was HOW to proceed.
The result is now that some people will go work on other short term things, and the rest of us will try to help Mike on E&C. Unfortunately without Tom.
I haven't had a chance to review all of the comments in the thread; however, I sure didn't see it as "either-or" but what and how (decisions as opposed to decision).
I know I owe you many responses.
But to your list, I would add:
Does a body of material exist to support a general understanding of what the E&C model is?
While applications will be responsible for their implementation of this model, genealogy is a much talked about. At the end of the day, users are not shy with their opinions.
The closest I can come to materials about the model is GenTech--but it's posted over and over again on the wiki that we should ignore it.
It doesn't help that for those who don't, the first part of GenTech says, "now that we see it, we don't like it." A little further into the report, its suggested that perhaps it won't benefit any users.
I know it's "not GenTech" because I've been told that. It's a two edged sword, though--if it's not GenTech, then where is the documentation, review and comment about "E&C."
Does this help?
It's scattered throughout this Wiki. E.g. I wrote http://bettergedcom.wikispaces.com/evidence%20vs.%20conclusions
There's the discussions behind
Obviously(?) the discussions behind
(This illustrates a source of confusion in early Wiki days - the Evidence & Conclusion Process was eventually characterised as the process we go through using evidence to create conclusions. One we all go thru. The Evidence & Conclusion MODEL is about how that data gets stored in the program's database - specifically that you can always see the evidence input to the analysis stage and the conclusions output, and there is never a "destructive merge" of people's data.)
Searching for "evidence-person" will give several pages of results.
(I have to say that cleaning up the navigation column has meant it's now more difficult to find stuff!)
or version 1
I have been able to understand the concept of E&C by reading these documents and the discussions here.
I'm not a native english speaker and not a programmer (but still a bit of a nerd) and I had not looked at a GEDCOM file before this project started.
I'm also only a hobby genealogist, maybe just past the beginner stage, who's only ever read half of one book about genealogy.
So, if even I can get E&C, I believe others should be able to get it, too. Nobody here is stupid.
Yes, it involves things that look like code - but that's just a very precise language.
It will take time to grasp the code-y stuff, but until then, please, believe the techy types when they all say that E&C does not do any harm for any way of doing genealogy. The opposite is true: only an E&C Model supports all the ways of doing genealogy.
Please don't be afraid.
That's a good question, and we should always check our decisions and ideas against this question. But Adrian has answered why these two often intermingle: Some of us are not satisfied with our software, and we see the problem boils down to GEDCOM's way of organizing data. A BG would free software developers so they can do more powerful things and don't have to let GEDCOM compatibility slow them down.
Personally, I stumbled on this project because I was a little frustrated with my software, and wanted to see how to get my data into another. I found out the transfer was never going to be easy, and it's not only my software's fault.
I also grew impatient with the way I could record my research in my software. With some research subjects, I'm at the "chasm", or it feels like that. I want to record all the pieces of data and their sources and what I think about both data and sources. I also want my "working hypotheses" stored and easily understood. I want to be more scientific, and have all that preserved for other softwares or other researchers.
Most of this can be done in a really good software, maybe such software already exists. But I know that without a up-to-date powerful replacement for GEDCOM, I would be stuck with that software forever, the data would never transfer if I wanted to do that. I don't like being held hostage...
So now I pray that we manage to keep our cool and work out a BetterGEDCOM that is so great that developers will want to use it :)
He's the one who has thought longest about data model solutions to genealogy problems. The areas that he hasn't thought about much (like research logs or citation templates) are already being filled by the good work others are doing here. I felt we were getting somewhere good.
Do we still have a chance of making this project work? Defining a data model is after all a very technical project, and it's not good to chase away any techy person that's pitching in. Luckily, Tom documented his model quite well and explained his ideas repeatedly so that the basics have become clear. The devil is often in the details, and we may have to go after the details without Tom's direct input.
Also, Tom is one of the small application developers that in my view would be most likely to adapt the new BetterGedcom. Luis and Mike will hopefully support it, and maybe GRAMPS if we've not scared them with our bickering. Some other developers have checked in from time to time, like Ben Sayer (Lineascope) and Christoffer Owe (Cosoft). Maybe they would be interested, too. So that would be a small nucleus of software that can then start to "radiate" the benefits of BetterGedcom.
Stuff that could go into the page as early as possible
- Create a drawing of a E&C tree (just saw testuser posting something)
- Describe the basic functionality
- Describe that Evidence Persons is an optional feature, every program does not need to change (one of possibly several requirements)
- Links to old discussions/pages, probably with a short statement about what has been discussed
The page could also have an issues list (random sequence below)
- Rules for entering info from sources (these are probably clear already)
- Rules for entering data in a conclusion person and where to put the conclusion statement – why are these the same (have been many discussions, probably 2 views re. copy info upwards or not - what to do in various situations eg. conflicting evidence)
- Relation to a source/citation model
- Relations to an administrative model
- What are the differences in what the Gentech model can do and what a multilevel E&C model can do. What are the benefits/drawbacks of each model?
- Interworking with other models – both ways - Gentech/NFS and existing Conclusion, one level, systems (possible problem with compressing eg a 3-4+ level tree into a 1 level one because the context represented by a level could be lost if we are not carefull)
- Will it work if users create Evidence Persons for some Persons and only Conclusion persons for others (may not be a problem?)
- How can the model be used (and perhaps adapted) to support various ways of working, based on how data would be recorded for each method.
- What about E&C for other things than persons.
It would perhaps be possible to discuss these in separate discussions. Also, some of the issues are alredy discussed at length - but we need to sum up.
Please, start throwing stones at the above! - or add to it.
"...are really trying to design a new genealogy program instead of a new import/export data model."
You would not be alone in that concern.
I just Googled, (Genealogy "records based methodology"), without the parens.
There were 13 returns. Three of those are postings to BetterGEDCOM. There are several "Keeping Families of heroin addicts together,"
and a few other public health entries.
Now, I've read Ancestry Insider's "Chasm" entry several times. I've read you post about it on BetterGEDCOM several times.
I'm not sure why you don't believe I've cross the "Chasm."
I fail to see how labeling me so brings us closer to the materials about E&C by which BetterGEDCOM can make an objective review. --GJ
Build a BetterGEDCOM Blog:
What is research: Understanding the Kaleidoscope
Build a BetterGEDCOM Blog:
What is research: Working with documents about a c1815 estate.
Build a BetterGEDCOM Blog:
What is research: Outlining the contents of an American Revolutionary War pension file.
They Came Before: Mixing it up: the indirect evidence challenge
They Came Before: One Spoonful at a time ...
They Came Before: Any time you connect a Miller ...
They Came Before: Just in time for Christmas ...
They Came Before: Bits and Breadcrumbs
We are getting no where. You quote link after link but steadfastly refuse to answer the simplest of questions. I don't think you understand what I've been talking about, and I don't think you care to. I see no point in continuing..
Thanks for all the links to previous discussions you hunted down earlier. Reading those discussions helped me immensely in understanding what has gone on before.
I have a real concern about the BetterGEDCOM project being "done" any time soon, if we continue to try to turn the Titanic around, with the iceberg in site.
The "Titanic" being Evidence Records and a collection of Facts / Events with their associated Citations. What you have been proposing is a "new way" of doing research (new being relative) and the recording of what we find.
Evidence Person, Conclusion Person, etc, are may be nice to know and understand and probably should be going there, but WHY can't we address the specific issues at hand.
To me, the biggest issue, NOW, for this project, is to address the sharing of Source Information and Citation Information between two application.
That does not mean that we shouldn't be looking ahead, in fact we should. BUT this project needs to define the "easy stuff", the stuff that is Broken, get the vendors on board, then get them to help US move in a new direction. I think what you have been proposing all along, is a New Direction.
WE can even agree on the sharing of source information and citation information.
As someone who has written a page about process, perhaps I should attempt to explain it from my viewpoint.
Modelling the real world of families, people, locations, organisations, etc, is "easy" because we can look away from our desks and PCs and see how the things work in real life. Hence, we can talk easily about the relationships between these things.
When, however, we talk about the world of genealogical research, we are not talking about things that are so common and obvious. Sources, Repositories and Citations are fairly easy because they are universal so we all have a common understanding. Except even there, life gets tricky when we ask what the content of a citation is for an event or attribute.
When we get into concepts like Research Log, Proof Argument / Summary, etc, where not everybody has an application that covers these and those application that do, cover it differently, then we need to sort out what those terms mean. And the simplest way of doing that (for me) is to describe a process for how those things are used.
So - I write a process to determine what the data is and I believe that is very necessary. The art, of course, as you both suggest, is knowing when to stop refining the process and turning it into procedures, algorithms and programs.
I appreciate your process and all of the work that you have done.
I think we need to do two things. 1) Identify and attempt to define what we know is broken, and define how to fix that, then 2) bring in the "new stuff". The Research Log, Proof Argument, the wonderful work that you and Tom have been doing.
If I have read the Second Life comment, the comments from Mike and GeneJ, as well as my own, I think we need to pause for a moment and see if we can attack and resolve the issues we know about, keeping in mind the new stuff, and try to help educate us End Users on why this new stuff is so important.
As you may know, this project started because two end users could not share, successfully, research information, more specifically Sources and Citations. We have plenty of examples on the Blog.
We more importantly need to get genealogy software developers to help us fix this problem.
My thought is if we get them hooked (on board) with fixing the Source and Citation issue (for example), they may be willing to come to the table to address the new research techniques that you all have been proposing.
Just a thought.
No worries. I've given up. If you've noticed all the people responding, however, you'll see that most people are thoroughly on board the move to add evidence. If we asked them whether we have to turn the Titanic around to get there I doubt many would agree.
You and GeneJ are opposed to the idea. You two are also the least computer savvy of the people actively participating here. I believe the main problems with the effort are trying to run it with a wiki and having it lead by people with no technical background.
Better GEDCOM has noble goals and I hope it will be a success.
I am torn between the pragmatic short-term concern of sharing what we have already, including citation data, and the long-term view of enabling a data model to cope with the Evidence & Conclusion Model.
However, having thought long and hard and tested various concepts in the pages of this Wiki, let me give you my opinion of the relative difficulties:
- implementing the Evidence & Conclusion Model in a replacement for GEDCOM is EASY.
- implementing citations in a replacement for GEDCOM that can be exchanged between applications and printed out consistently in whatever style the recipient wants, is HARD.
Why is the latter hard? Because any initial analysis of the citation formats / templates / whatever in ESM's EE book starts with HUNDREDS of different elements. Somehow, I am (naively?) convinced, several hundred of those can be reduced to a couple of dozen variants on Author, Date, etc. Which still leaves another several hundred which may (or may not, I just don't know) be so dependent on the type of source document they come from, that they will never be reduced in number. And of course, the number of types of source document increases all the time.
Somehow, we have to concoct a model that enables the transfer of citation data - BUT the sheer volume and volatility of stuff makes it HARD to analyse, to design a solution and to update that solution when new documents come along. There might be a simpler way of doing it than hacking it all out item by item - in fact, there has to be. But what that method is, I don't know yet.
Whereas, as I say, implementing the Evidence & Conclusion Model in a replacement for GEDCOM is EASY. It really is. LDS have already done it, albeit in a truncated 2-level only form. Pretty much all it needs is the pointers from one person to another.
BUT BUT BUT, don't imagine that I'm also arguing that implementing the Evidence & Conclusion Model in an application is as easy. It won't be - there a whole raft of extra navigation that has to be put into the software to go up and down the conclusion tree. Fortunately, this is not something we have to concern ourselves with. If the developers were starting from fresh it would probably be fairly straightforward but as they'll all want to modify their existing software, it could be dodgy.
Actually - I'll tell you what I think the biggest issue is - it's not the Evidence & Conclusion Model, it's not creating a format for citations. It's answering this question - WHY should the vendors come on board?
I guess Tom is right. I don't know what I am talking about.
I am NOT saying that the Sources and Citation issue is Easy, but what I AM saying, that if you look at Evidence Explained!, you will see a series of Fields (field names), right? They are strung together.
Don't most databases understand fields and how to string them together? My reading of a "First (Full) Reference Note" is a series of Fields that are in a certain sequence. If you were to look at Roots Magic, for example, you can see those field names. Other programs don't present those field names, but behind the screens are those (probably) same field names.
<fieldname> text </fieldname> in what ever format developers want, like levels in the current GEDCOM or some other set of rules, could be generated and received for presentation to the other end user.
I do understand, this is not as easy as it sounds.
Why should we have vendors on board? May be that is the real issue here. IF the Vendors are NOT on board with the project, how will I, a stupid end user, ever going to see a successful exchange of information? If the vendors are not on board, who will transform the data in my software to a BetterGEDCOM file, and who will recover that data when it gets to the other end.
What am I missing?
Note that all this is based on the idea that using E&C is an OPTION in the software. Even I would probably only use it occasionally, e.g., since I'm so far down the road of conclusion-only people - I can no longer easily see the logic that lead me to believe that this guy over here being baptised, is the same as that guy over there being married.
Your idea of concocting some sort of template for citations is probably the right way to go but there is a balance to be struck in there between sticking stuff, any stuff, into a template where it can be transferred but not understood (because there's no agreed, world-wide name or definition), and creating an official BG description for an item where the item is so important that all software needs to see an explicit definition of it.
And yes, I would agree with the view that getting vendors on board IS the biggest issue. Without them, as you don't quite say, neither the geekiest end-user nor the end-user with no IT knowledge, will ever see successful data transfer.
And I still don't see any compelling reason for the big boys, who are software guys, not genealogists, to come on board.
Firstly, the general philosophical point is that the Data Model that we come up with for BG MUST allow for all sorts of methods of working. If we put all sorts of mandatory relationships or processes in, for instance, then the new-starters in genealogy are going to get so confused that they'll pack up and go and watch the ball-game. And even the experts have different ways of working <grin>.
You said, a while ago: "I don't know that all of us have agreed to have the source system as the clearing house for the record capture". By reason of your words "software source system creat[ing] a master source, then I decide the further dispensation of the record..." I interpret the phrase "source system" as meaning those screens and routines within your genealogy app that deal with the entry of sources. Then you say "I don't see myself creating an evidence person". That's fair enough. The whole point of the E&C Model _and_ the philosophy is that anyone who wants to create an evidence person (or persona in LDS nFS speak) from a source record, should be able to do so. And anyone who doesn't want to, doesn't need to.
So while we certainly haven't all agreed that inputting a source is in effect entering a clearing house, neither should we need to, if it's optional.
You also referred to Tom's example having "the information snippet and logic and reasoning being recorded in the database proper" and said further "If the source system is not the clearing house for these snippets and logic/reasoning, I object"
I interpret this to mean Tom's example having the proof argument / proof summary / similar justification being recorded against the event (or whatever it was) (as a note I think it was) and that you wanted it with the source and citation data.
Again, I think the whole point of the philosophy is that someone should be able to put the proof argument / proof summary / similar wherever they want. Tom happens to think it works as a note at the appropriate point. You happen to think it works best as part of the citation (or as a link from within the citation). Me - I don't want to adopt either of your solutions because (a) I'm an analytical guy that wants to split my entities apart, which means splitting my proof argument etc out from the citation but also (b) I don't see enough pointers and links in Tom's notes to enable me to get back to the very beginning of the research chain. (Sure, I can read it and work it out backwards but that's boring.)
To make all 3 views work, we actually need a data model that includes sources, citations, research notes, proof arguments, shared notes and probably loads more. Then we can all press whatever button we like. In particular, you can use your "source system" (i.e. the source and citation data) as the clearing house for your logic and reasoning, and Tom can use notes, and I can use these other screens. If they exist. And by the way, this issue actually has nothing to do with the Evidence & Conclusion Model and everything to do with where we store "proof" data etc, which applies in all methods of working.
Finally the idea that we need to "vet the evidence person concept before we make it the clearing house for anything". Again, similar thing applies - if it's in the Data Model, you don't have to use it. You can control stuff the way you always have.
In summary, the E&C Model contains within it the ability to work in the old-fashioned way so, providing we don't make anything stupid mandatory, _everyone_ can carry on how they want.
Bear in mind that I am talking about a Data Model - what happens in an application using the Evidence & Conclusion Data Model is not something I can promise you anything about. The arguments about working methods are ones you'll need to have with the writers of the software and you do have a choice there. (Assuming any of them are interested in change).
Can be found here:
On an unrelated note, when you use the [[code]] be sure to put in forced \n (line feeds), otherwise long lines in the [[code]] tags stretch out the entire thread.
I think that works for me. Alter the minimum amount possible and just let the software float up to the top of the tree. Or sink down, depending on where you start.
I must admit I'd not noticed before that the top person in the tree possibly only consists of links pointing down. That results in a lot less copying than I had in mind because I thought the merged person would include all the events and attributes.
In which case - and I think you did mention this somewhere but... - how would you suppress one of the lower events or attributes if your manual decision was that the birth date off evidence-1 was the "proper" date so should be taken and suppress that off evidence-2 (as distinct from keeping both as alternatives)?
The $64K question. You have to indicate your preferred attributes somehow. And you only have to do this in the cases where there is duplication or conflict.
One solution is to "copy up" the ones you want.
Another solution would be to use a scheme like:
Again, the user wouldn't have to do this ugly stuff by hand or even see it. The user interface would make it simple. For example, the NewFamilySearch tree has exactly this functionality. When you use NewFamilySearch you can combine any number of persona records into a person record, so person records can contain gobs of redundant and conflicting information. There is a user interface in NewFamilySearch that gives you pop-up menus for every attribute, showing the multitude of values that come from the various personas. You choose the preferred value in those menus. The chosen values are the ones displayed for the person, while all the other ones remain available but tucked out of the way. The NewFamilySearch application is like a wiki in the sense that after you make your preferred selections, anyone else can change them. They can even redistributed the personas to different persons if they choose.
You can use your imagination to see some of the ramifications of this approach. For example, when citations are generated for a person, that citation can be customized to only cover the sources that you have chosen for the attributes to be displayed. You would never have to worry about "junk" that got added to a person showing up in any of your reports and so on.
GeneJ, I'm not sure if you are referring to my earlier statement of "I would like to see more concrete examples like this because it helps me better understand EXACTLY what the person is trying to say. Otherwise, as others have previously stated, it's easy to misunderstand what they mean." or not, but if you are, I would like to see both, case studies and project examples. I also would like to see those case studies and project examples demonstrated within the scope of the E&C model Tom is proposing.
I wonder if you would put together a case study and then if we can get someone (Tom?) to show how to represent the data in the E&C model proposed. You may have already explained a case study on the wiki somewhere. If so, put a link in so we can find it.
I really think what Tom proposes can work in all cases with a few tweaks here and there, so maybe if we can get an actual example put together, it will alleviate any misgivings.
Of course, your 2nd scheme could also allow a similar means to create the different value but then it has 2 ways to record a value (actual or cross reference), which seems unnecessarily complex.
We'd also need a means to suppress a value, e.g.
1 DEAT REMOVE
to remove the death event from someone who isn't dead.
And... not sure how I'd do this yet - if you had a list of occupations on 1 evidence person plus another list of occupations on the 2nd evidence person and you wanted to suppress just some from one... Maybe you'd need to have a rule that says "If there is copy-up, then what you now have is the full list" - which seems sensibly simple anyway.
"[why do you think]...Tom's proposal would separate a snippet from it's authors identity, or how it would separate the "snippet" from a delayed birth record from the information that is was delayed? ... I may be missing something, but I don't see a reason to be afraid for this to happen?"
I think Tom and I have hashed through this over time. (Ala, the information IN a source vs information ABOUT a source.)
To find these discussion references just takes forever.
"I extract genealogical information from that source and call it evidence..."
We took an oversimplified example way too far here:
Direct Model Support for the Evidence and Conclusion Process
Please let me know if that didn't answer your question.
Separately and perhaps a little indirect... Say I'm talking to two people--one being a data person and the other better characterized as a user :)--and I mention the "GenTech" like E&C (don't shoot me again, Tom)--the data person's face lights up, almost like they feel finally at home.
If I continue to describe the "evidence person" concept, the user's got a furrowed brow. Their eyes gloss over about the time the data guy starts to drool.
If I try to talk about why? The user says, "I can do all that without this thing. When I explained all the sorting benefits the other night, one person sent me a note-"Oh, I agree, but I heard Microsoft has this new product called Excel."
More than one person has called it "old technology" and ask me why the data side doesn't "get it."
(See, they might want to shoot me too.)
I want to grab the record and the online citation and have my software source system create a "master source." Then I decide the further dispensation of the record. I'd probably always send it to the Admin-Research (research log). Occasionally it might slide right into someone's individual record.
As I 'splained in the last meeting, I don't see myself creating an evidence person.
Adrian wrote, "[what did you mean by...] the information snippet and logic and reasoning being recorded in the database proper"
Sorry, I don't have a good way of distinguishing between the Admin-Research area, the individual, places and ships section and the "source structure" sections of the database. I'm sure y'all will let me know.
Please let me know if that did not answer your question.
You wrote, "I don't understand ...snippets"
Humm... I think it's what you call evidence. Does the response above to testuser help?
You wrote, "the clearing house for the record capture ... I don't know what this mean."
Please see the statement in context.
Perhaps also Louis' comment here:
With the link to:
And the discussion here...
You wrote, "[what did you mean by]..database proper."
See my above comment to Adrian, please. We don't have definitions for everything we'd necessarily like to have defined.
More to write .. have a meeting now. --GJ
This doesn't answer my questions, but it's not important. It's clear from your answer to Adrian and your general skepticism about evidence records in general, that you don't need them, you don't want them, you believe they have nothing to offer. Your definition of the E&C process is to collect records, and when you decide they apply to a person of interest, you add facts from the records to a person in your database. This is the person-based methodology. But most of us are interested in achieving the advantages of record-based methodology, and to enable this methodology our evidence must be structured into records that computers can process.
It sounds like you are referring to what a genealogy application will present and allow you to do, not do, require etc in the user interface. The underlying data model doesn't have to be that coupled to the user interface of the application. Just because we talk about "evidence" persons, doesn't mean that a software application has to present it to you the user in that way. In fact current genealogy programs can continue to operate the exact same as they do now. The only difference would be in what format the data is exported/imported.
An example is the Family Pursuit internal data model. It's data model has no concept of a family and yet the application can display families just like any other genealogy program. You can add and remove children/parents from a family just like any other genealogy program. When data is exported via GEDCOM is is put into families and exported that way, but that does not mean its stored that way internally.
The advantage of an E&C model is that not only will it support the old paradigm of data storage/transfer, but it will open up the possibilities of future genealogy software that can take advantage of the approach and be able to transfer that data to other genealogy software applications that also support this new paradigm.
This may not be anything new to you or others, but I just wanted to throw it out there just in case.
"The Goal of the BetterGEDCOM Project is:
BetterGEDCOM will be a file format for the exchange and long-term storage of genealogical data.
It will be more comprehensive than existing formats and so become the format of choice."
This in my mind clearly states the purpose is to define an "file format for the exchange ... of genealogical data", not a genealogical research process - which would be what a genealogy program does.
Mike writes, "On Friday 8:26am (Mountain Time), Tom presented a concrete solution that elegantly solves the problem Adrian introduced. Does anyone have a reason why this model does not solve the problem? I would like to see more concrete examples like this because it helps me better understand EXACTLY what the person is trying to say. Otherwise, as others have previously stated, it's easy to misunderstand what they mean."
I don't know that all of us have agree to have the source system as the clearing house for the record capture.
Tom's example was only enabling a citation to be "created"--I think in his example he still had the information snippet and logic and reasoning being recorded in the database proper.
If the source system is not the clearing house for these snippets and logic/reasoning, I object.
You wanted examples.
(1) I do not believe information in the source is more important than information about the source. How can you separate a snippet from it's authors identity as though that snippet has some separate value. Ditto, how can you separate a snippet from the identity of its "source of the source" and believe that snippet has some separate value. How can you separate the "snippet" from a delayed birth record from the information that is was delayed. We know that sources come in all flavors. It seems to me you're trying to carefully package all the fruit in boxes and then strip the labels off.
Please let me know if you need a list of authorities for the above.
(2) The "evidence person" concept is not vetted from my perspective. Surely we have to vet the evidence person concept before we make it the clearing house for anything.
could you explain to me why you think that Tom's proposal would separate a snippet from it's authors identity, or how it would separate the "snippet" from a delayed birth record from the information that is was delayed?
I may be missing something, but I don't see a reason to be afraid for this to happen?
Can you please explain what you mean by those 2 phrases? I might take a guess but, as we know, my guessing success rate isn't good.
In particular, what's "the database proper"? Because in my IT-based view of the world, the database-proper represents all the stuff stored in or by the application, so would include people, places, families, etc (call that the real-world-side if you like); the source and repository records; and the research logs, tasks, proof statements, etc. (I've split those into 3 just in case it's useful to what you mean:
- real world;
- internal to the study of genealogy (as per current GEDCOM)
- internal to the study of genealogy (excluded from current GEDCOM) )
"I don't know that all of us have agree to have the source system as the clearing house for the record capture."
I don't know what this means. We record sources and we record what we find in sources. We add notes and thoughts. Then we use what we find to decide who was who and why. We record why we think what we think and we put those thoughts in proper places. This is the GPS fully supported. Why do we need to worry about a "source system as a clearing house" before we talk about this? Why can't we just decide on the model, the entities it contains, how they relate to one another, and what kind of information we put into each one?
"Tom's example was only enabling a citation to be "created"--I think in his example he still had the information snippet and logic and reasoning being recorded in the database proper."
I have everything stored in the database. For me that's the point in having a genealogical system. My source records are in there, my evidence records are in there (I think they are combination of your citations and now snippets), and my notes and reasoning and conclusions are in there. And they link and relate to one another exactly as logic and common sense demand and exactly as required by the GPS.
"If the source system is not the clearing house for these snippets and logic/reasoning, I object."
This implies to me that you think of your "source system" and your "database proper" as different things. Does this mean you think Better GEDCOM needs two separate "sub-systems" -- the part you use when collecting evidence and reasoning about it -- and the part you use once you make your decisions and you want to build family groups and pedigrees? I don't know this for sure, because you bring up and use new terms without definitions -- in this case snippet, source system, and database proper. Doing this would cripple our genealogical applications to point that they would only be able to do what, well, what applications of today can do.
In the DeadEnds model, the "source system" and the "database proper" are combined together into a single model and all the records defined by that model are stored in the same database. Is this the basis of your objection? You think they must be kept separate and independent? The "source system" is, I can only guess for you, the parts of the model that support the records-based methodology (repositories, sources, evidence, citations), and the "database proper" is the part that supports person-based methodology. And for some reason you want to keep them separate? I don't understand why anyone would want that.
"You wanted examples.
"(1) I do not believe information in the source is more important than information about the source. How can you separate a snippet from it's authors identity as though that snippet has some separate value. Ditto, how can you separate a snippet from the identity of its "source of the source" and believe that snippet has some separate value. How can you separate the "snippet" from a delayed birth record from the information that is was delayed. We know that sources come in all flavors. It seems to me you're trying to carefully package all the fruit in boxes and then strip the labels off.
Please let me know if you need a list of authorities for the above."
There is nothing in the model that makes one kind of information more important than another. Every item of information is linked to the items it depends on. You may misunderstand the notion of separation in a modeling or a database sense. Your notion of separation seems to be one of "you can't there from here", whereas the modeling notion of separation is one of clarifying and implementing the proper relationships between things. Yes, an evidence person is "different" from the the source record that defines where it came from, but the evidence persons link to those source records through a relationship. They are different things, as they must be for the purposes of modeling and computation, but they cannot be called separate. You have to think of these databases, not as filled with millions of unrelated and different types of records, but as a network of objects in which every record may be related to and linked to many other records. Take for example a "conclusion person". It is linked to every evidence person that a researcher decided represents that real person. Those evidence persons contain all the citation detail and other notes recorded for them, and they also link to the source records they came from. Thus, the software, having access to the conclusion person, also has access to all the evidence, citation info, notes, thoughts, conclusions the researcher has brought together. There is no distinction between a source system clearing house or a database proper; we have a single unified database based on a single unified model that has the entities and relationships needed to support the entire GPS.
"(2) The "evidence person" concept is not vetted from my perspective. Surely we have to vet the evidence person concept before we make it the clearing house for anything."
The evidence person concept has been fully vetted many times as the key addition to genealogical systems that work in the records-based paradigm. In a recent response to you I described three computing systems that deal directly with the records-based world, and all three of them used this concept as the key to their implementations. When the same concept is independently "invented" by three different teams, as the core concept they need to implement records-based handling of person-based data, it is hard to make an argument that the concept has not been vetted. And beyond the three working applications I described, the event person is also a key concept in the GenTech model, where it is called the persona. Even though I don't like the GenTech model, it is hard not to say that the evidence person record was not vetted, as a fourth example, by them.
(This was in the discussion "The Missing Link - a new entity type or a new type of source?" for page "Research Process, Evidence & GPS" )
(Incidentally - that's Friday, 3:26 pm on my screen, so proving this Wiki stores a standard time and translates it to my local time when displaying it. Which I kinda thought was happening - unless some of you guys really are working away at 5 a.m.)
The issue referred to is basically how to represent the justification for merging 2 personas / evidence people. At least, I think that's what it was about!
What my page "Research Process, Evidence & GPS" is telling me is that the output things associated with the research process are entities. A proof argument / proof summary / whatever, might explain why person X married in 1850 and another person of the same name baptised in 1825, really are judged to be the same person. (It might be a link to an argument whose text is elsewhere or it might be the full argument or...). Whatever it is, this "proof" is an entity in its own right. And I'll have lots of these in my database.
Now, while it's an entity in its own right, that doesn't mean I would expect the "proof" entity type to be physically separate from the research goal entity type to be physically separate from the research log entry entity type etc.
It is possible that these could all be sub-types of one generic entity table in the database (e.g. "Research Item"). It is even possible these could be sub-types of the Source entity type or the ... - I just haven't sorted it out yet.
So - I don't really, to be unhelpful, yet have a fully formed opinion whether Tom's proposal of a structure like this:
0 @I7@ INDI
1 INDI @I4@ <<-- the conclusion person he created.
1 INDI @I9@ <<-- the conclusion person for the Nova Scotia persons.
1 SOUR @I666@ <<-- it's him making the conclusion again
2 TEXT ... <<-- his words on why these two persons are probably the same
... would satisfy me. However, I do have concerns. My major concern is that this is all text - I don't see how to navigate back from the "words on why these two persons are probably the same" to the entities containing the research details that went into providing this conclusion. I would suggest that there needs to be a cross reference to point to the "proof" entity.
I am loath to create an extra entity type to represent that "proof" and link to it when means already exist to point back to something - e.g. to a Shared Note or to a Source.
One could create a Source to hold the text of the "proof". I think there are 2 issues with this - firstly many people cannot accept the concept of a Source created "inside" the system (as a proof-source would be) and imagine a source must exist somewhere out in the real world. The concept of an internally generated source doesn't worry me but as a mathematician I'm used to iterations and generalisations (e.g. a Set of Sets). However, the 2nd issue of using a source record to physically contain a proof argument, is that a conventional proof argument will contain text with citations for bibliography, footnotes, etc. None of this typically appears in the text linked with a source. (Other items may point to the repository,etc. for the source but that's different from a whole series of citation references inside text.)
I am, therefore, inclined towards creating a sub-type of the Shared Note entity (I call it 'Shared Note' meaning it's the one equivalent to the Level 0 type of note in GEDCOM, not the in-line note.) This (or these) sub-types of the Shared Note entity could be created to contain the research notes and - interestingly - should therefore appear as visible notes in a program that did not implement research notes, logs, etc., rather than be rejected. (Updating them would be very dangerous but at least they would be visible).
Whether or not all research "things" could act as sub-types of Shared Notes, I personally have no idea since I've not yet sorted them all out in my head. But that's how I'm thinking about recording the justification for merging 2 evidence people / personas.
BUT BUT BUT... The topic of how to point back to research notes and proof arguments is one that applies to both the Evidence & Conclusion Model and the conventional Conclusion-only Model. So whether you agree with my ideas on how to point back to research or not, does NOT impact on how to progress the Evidence & Conclusion Model.
I am not sure if these topics have been fully discussed or not. (If they have been, feel free to contradict yourself - I'll never know).
In general terms, I suspect that a description of the Evidence & Conclusion Model will have to include descriptions of how to update the entities concerned since the "methods" (to use an IT term) will not be obvious as the Evidence & Conclusion Model is not part of the real world, but part of the world of genealogy where each of us may have a different methodology. A couple of these update methods that need to be defined (with examples) are:
- how to update (or not) a family (otherwise unaltered) containing a person whose records have been superseded by a higher level person in the hierarchy
- confirmation that evidence and conclusion applies to more than just persons. And if so, how do we deal with an update of a place when that place is referenced across the database?
- how to update a family when we are adding new data about the families' event and therefore creating a new conclusion family? And what about the people who are members of the family?
These are probably just aspects of the same thing or can all be solved by the same ideas.
Example 1 - updating a group. Suppose I have a source record for the "Xshire Militia". I create an evidence group for the "Xshire Militia" containing just the information from that source. (This is the equivalent of a persona but for a group. And no, I'm not going to call it a groupa).
Then I have a source record for the "Royal Xshire Militia". I create an evidence group for the "Royal Xshire Militia" containing just the information from that source.
Then I have a Source record saying that the "Royal Xshire Militia" is simply the "Xshire Militia", renamed in year Y.
If I am working on the Conclusion only model then I do a destructive merge of those 2 groups.
If I am working on the Evidence & Conclusion model then I ought to apply the E&C principle to everything, so I create a conclusion-group with 2 dated attributes for Name, and point the new conclusion-group back to the 2 evidence-groups.
Now - what about all my relatives who served in the militia? They'll be pointing to either Group1 ("Xshire Militia") or Group2 ("Royal Xshire Militia"). But Group1 and Group2 have now been superseded. I reckon there are 2 possibilities
(1) amend all the references to Group1 and Group2 to read "Group3"
(2) leave them untouched but make the software follow the Group reference up the conclusion tree
Method 1 is a pain because any amendment also needs to create a new conclusion person which just goes on and on and on.
Method 2 makes more sense to me - if the software writes a report about group membership, it comes to "John Doe" and his membership of Group1, finds that Group3 has superseded Group1 (because Group3 points back to Group1), and skips on to Group3 to get the details of the Militia for the report.
I reckon similar principles apply to updating a family, where the family's events and attributes do not change - if person P1 has been superseded in the conclusion tree by person P2, then don't change the family membership to replace P1 by P2, simply allow the software for the Family Reports to navigate first of all to P1, then float up the tree to P2.
I think the other instances I asked about can be solved similarly.
Is this right? Missing something?
Your method two in the Xshire Militia example is definitely the one to use.
The idea of "building a tree" of conclusions should always be to defer to the lower level records when possible. The links are there. Let's use them.
Here is a very simple example.
Let's say I have a birth record that mentions facts about the child and his parents. I extract the info available into the following five records (using GEDCOM for syntax -- could use XML or JSON, but why confuse matters?):
Here we have one source record, three evidence person records, and one evidence family record, all extracted from a single source in the real world.
Later on I discover a death certificate from Brooklyn, New York, leading to the two new records:
I now have four evidence person records, two for a Daniel Wetmore. Let's make this very simple and decide that these two Daniel Wetmores are the same person. They are in fact are, but I have 40 or more records that follow his entire life from New Brunswick to Brooklyn, so I know this. To just combine what I have shown here would not be such a good thing to do.
Before I join them consider this. One record has his birth info. The other has his death info. There is no overlap between the two. (Let's say his birth place was marked as unknown on his death certificate.) The other thing to note is that his name is different in the two records.
Here is how I would build the conclusion person record for him:
Note that there is no sex, birth or death information given in the conclusion person. All this is "inherited" directly from the evidence records.
Note that this conclusion person also inherits the family he is in with John Wetmore and Anna Van Cott being his parents.
How much of this does the user of the software have to know about. Very little in fact. The user interface never show him/her this ugly stuff. He/she just sees a nice Daniel Van Cott /Wetmore/ with a nice birth record and a nice death record. He this Daniel Wetmore needs to picked apart in the future there will have to be user interface screeen to show the structure of the person, but that is necessary, and still wouldn't have ugly GEDCOM tags to deal with.