BetterGedcom - Evidence and Conclusion Process

ttwetmore 2011-03-04T23:32:04-08:00

Direct Model Support for the Evidence and Conclusion Process

There has been recent discussion about the "steps" taken in the genealogical process and whether the Better GEDCOM model can somehow follow those steps, and whether the data extracted from a database into a Better GEDCOM file would be able to reflect the exact state that ongoing research was in at the moment the date was exported.

The answer to this question is a resounding YES, and the ability to make this happen was one of the main design goals of my DeadEnds Data Model, and therefore a major goal of my attempts to help shape Better GEDCOM so it can do the same thing.

In the DeadEnds data model a Person record may refer to (contain in a data structure sense) an array of other “sub” Person records. When this happens the “top” Person represents the CONCLUSION that all the “sub” Persons hold information about the SAME real human being. The “top” Person should refer to a Source record with the researcher’s proof statement or justification why he/she believes the conclusion. These TREES of Person records may be built up to any depth, reflecting the reality that the researcher may have to make many decisions about many records as he/she builds up the final “top” Records that represents the researcher's ultimate conclusions about the human beings he/she is researching. At any given time the bottom LEAVES on these trees are the Person records created directly from Evidence in Source material, the ROOT of the tree is a current top-level conclusion Person, and all the INTERMEDIATE Persons in the tree are intermediate conclusions made during the process. EACH OF THESE DECISIONS MADE BY A RESEARCHER IS ONE OF THE STEPS ALONG THE RESEARCH PROCESS WE TALK ABOUT.

The building up of these trees of Person Records is AN EXACT COMPUTERIZED ANALOG of following the normal genealogical research process. That is, the normal, rich, genealogical process can be represented by genealogical application programs as ALGORITHMS FOR MANIPULATING THESE CONCLUSION TREES. The state of these trees at any given point in time represents the EXACT STATE OF THE CURRENT RESEARCH. EVERY STEP THAT A RESEARCHER HAS TAKEN IS REFLECTED EXACTLY BY ONE OF THE GROUPINGS OF RECORDS THAT TOOK PLACE IN FORMING THE CURRENT STATE OF THE PERSON TREES. This property of a model to be simple yet be perfectly in tune with a process we want to support is PROFOUND. If Better GEDCOM can take advantage of this it will succeed.

This is the genealogical research process boiled down to a data model that allows Person records to be organized into conclusion trees and a process that is supported by building up and modifying these trees. This is a DREAM SITUATION FOR SOFTWARE. We have a well understood process (genealogical research) and we have a DATA MODEL that IS PERFECTLY SUITED TO HOLD A COMPLETE REPRESENTATION OF THE COMPLETE STATE OF THAT RESEARCH AT ANY POINT IN THE PROCESS. And EACH STEP IN THE PROGRESS OF THAT RESEARCH IS REFLECTED EXACTLY IN CHANGES IN THE CONTENTS OF THE DATA IN THE SOFTWARE. The important point here is that the normal genealogical research process, the one taught in books, the ones tested on to get certified, can all be thought of in data model terms as manipulations on these conclusion trees.

Note three important points.

1. Building up of conclusion trees is a perfect representation of all the post-evidence gathering steps in the normal genealogical research process.
2. Original evidence records are never changed or destroyed. All conclusions are implemented by adding new records that brings together two or more other records that are believed to represent the same human being.
3. Any step can be undone and the process is reversible throughout. Any decision can be rescinded by simply removing the conclusion person record that brought a set of records together.

ALL IT TOOK IN THE DEADENDS MODEL TO SUPPORT THIS WAS TO ALLOW PERSON AND EVENT RECORDS TO REFER TO AN ARRAY OF SUB-PERSONS OR SUB-EVENTS. This is why I continue to strongly advocate the Better GEDCOM Data Model provide the same feature.

ttwetmore 2011-03-12T12:17:31-08:00

I believe the ESM definition of negative evidence is evidence that you feel should exist but you can't find. I don't see how it applies to Adrian's example. I must be missing something.

ttwetmore 2011-03-12T12:39:10-08:00

Trying to pick up with Gier's extension of Adrian's example. I think we find a death record of a person named X, born in Y in year Z, and the death occurred before the census of the second evidence person. And we found it after Adrian joined the first two evidence person into a conclusion.

This isn't negative information. This is conflicting information. You could remove the original conclusion person and end up \ just three evidence persons to await further research. That would probably be the best idea. Or you could join the first with the third and leave the second alone. You can't know what the right answer is in this case.

We have to deal with conflicting data frequently. You make your best guess and document why you did what you did. A great thing about the E & C model is that your evidence records stay intact for all time. You can undo and redo conclusions to your heart's content.

I should mention the rules that are used for "nominal record linking". This is the technique used by historians who are trying to establish family patterns and family statistics in villages centuries ago from church registers. The goal is to find out things like the average sizes of the families, the average spacing of children, and the consanguinity of spouses. The rules specify how to evaluate the combinations of birth, marriage and death records and decide how to combine the records into individuals. This is a specific set of rules for doing a specialized work. It's a very specific evidence and conclusion model. The point of bringing this up is that the process often finds missing or ambiguous information. The rules say how to handle the different cases. Of course, in this work, the researchers aren't all that concerned with perfect family reconstructions, only with family reconstructions that are consistent with the average patterns. Of course, we are interested in correctness, but we can't always have it.

I think the only rule you would want to apply is that any evidence person can be in at more one conclusion person.

AdrianB38 2011-03-12T13:41:19-08:00

"In your example, I don't see why you need to record a, b or c"

I believe I surely have to record them somewhere - there may be profitable debate over how much I need to record about them and where.

Let me take a fictitious example:
George X is baptised in 1795 at Northwich, Cheshire.
George X is present in the 1851 census, where it says he's born about 1797 at Northwich, Cheshire.

How do I know that these two are the same people? Well, if there's another George X, baptised 1796 at Northwich, Cheshire, then all bets are off because right now I don't know which of the two survive to the 1851 census. Suppose though, that there is no evidence in the church (or any other records) of another George X being baptised at roughly that time in roughly that area. Then that's a good step forward.

So somewhere I need to record this factlet - not only is the baptism for George X in 1795 at Northwich, Cheshire, but also it's the _only one_ that satisfies certain search criteria. Where would you, Tom, (in your non-automatic mode!) think that should go? Just as notes added to the Source or to the Evidence Person arising from the 1795 baptism? Or notes on the conclusion person?

That's (a) dealt with.

(b) and (c) are designed to check if there is another George X born about 1795 at Northwich, Cheshire, who was _not_ baptised. If this is true, then there were two such chaps born and one of two things hold
- either both survive to the 1851 census, in which case there'll be two chaps visible as George X born about 1795 at Northwich
- or one has died before the 1851, in which case there'll only be one chap visible as George X born about 1795 at Northwich but there should be a record _somewhere_ for the burial.

So, again, to decide which is the case, I'd want to record that (b) in the 1851 census, there is (say) only one George X born about 1795 at Northwich who can be seen in the census. Again - where would be best? - against the Source details, against the evidence person for the census or what?

And finally, there's the fact that I've searched the burial records in the Northwich area and a few adjacent parishes as well, and I want to record that there's no evidence of any burial of a George X of that age. This is (c) - again, where's best for this data to go? On the evidence people or the conclusion person or...? This is the one I called negative evidence since there isn't a burial source that I can note anything against - I've done a search and turned up nothing. But I can understand if this is felt to be stretching the point of what negative evidence is.

Once I've got all those (a), (b) and (c) things sorted, then I can say that the two Georges are the same guy. It's the recording of these bits that I need to sort out in my head. I doubt it's any more sophisticated than notes - but the more notes we have, the easier it is for something to sneak in or out. Do we need specific logic related notes, for instance, rather than just ordinary text notes?

GeneJ 2011-03-12T16:10:05-08:00

Tom wrote, "I believe the ESM definition of negative evidence is evidence that you feel should exist but you can't find. I don't see how it applies to Adrian's example. I must be missing something."

Perhaps we are missing something?

In developing materials for a proof, it's pretty common to research and document alternative hypothesis. Ala, to exhaust alternative theories to the research hypothesis.

Example:
Joe's many many records consistently report him born at a particular place; his date of birth is just a little inconsistent in the records (including partial and inferred dates); parents names not direct in the records.

Just as you'll pull together the information that supports or conflicts with your hypothesis, you'd consider and test other hypothesis, right?

GeneJ 2011-03-12T16:23:48-08:00

About Adrian's George X.

The hypothesis he wants to test is that George X (2) is the same man as George X (1).

Other than his research to show they are the same man, in the case he presents, for the alternative "they are not the same" -- you also have two research paths to document:

(i) That George X (1) could not have been George X (2).
(ii) That George X (2) could not have been George X (1).

ttwetmore 2011-03-12T18:09:51-08:00

Lots being said. Let me describe what I think.

The administrative objects (goals, etc) are where we record what we are trying to do and prove, and where we log what we have searched, where and when, and also log, possibly, a list of what we found. Then the detailed facts of what we found go into the evidence objects. We don't mix administrative and evidence records. To say we have searched a set of records and not found anyone named X is a fact that would be attached to an administrative object. We certainly don't want to create an X evidence record and then inside it say we didn't find this person. Evidence records should only hold the evidence that's really there and they should point to the source objects they came from. I admit I am not well conversant with the administrative objects, but these things seem reasonable.

When it comes to making conclusions you have the evidence you found and you have record of what you did. Beyond that there is no magic. You must make your conclusions based on those two kinds of things. If a name is a very rare one that has only been seen in a way that is consistent with the person you are looking for, you can have high confidence when you combine evidence. If you have a very common name and clearly conflicting data, you shouldn't join them. There's lots of room in the middle.

If you read articles from the stodgiest of the stogy, the New England Historical Genealogical "Register" you will see how the real professional researchers often couch their conclusion statements with all kinds of qualifications. Like "this is likely the John Smith who was listed as Jno Smythee in such-and-such as a soldier in King Philip's War." Or "this Thomas Winslow may have been the Tho Wynsslough made Freeman in Salem Court in 1677." It's clear even the pro's can't always deterministically decide who is who, so we should emulate them and do as good as we can. It's Occam's razor or the law of parsimony or whatever you want to call it. Search the records, record what you find and what you don't find. Make your conclusions based on what you feel best explains the data, but in your proof statements point out the weaknesses in your arguments.

If you have a good system based on a good E & C model, then as you get new evidence and work on new goals, you can always rearrange the evidence into different sets of conclusion objects as rules of parsimony and logic dictate. Let the knowledge that you might not get it perfect set you free.

GeneJ 2011-03-12T21:47:45-08:00

I record negative evidence frequently as a standard full reference note.

See also Requirements Catalog, Source01-Information, Source and Evidence type (quoting):

BetterGEDCOM should record separately whether a Source is, for a given event or characteristic:

* Primary or Secondary Information (latter includes tertiary)
* Original or derivative source (e.g. paper or copy/digital image; document or compiled summary; document or transcribed version)
* Direct, indirect or negative evidence

GeneJ 2011-03-12T22:30:19-08:00

Some examples of negative evidence that take the form of reference notes follow. These extracts were located Board for Certification of Genealogists Work Samples (http://www.bcgcertification.org/skillbuilders/worksamples.html).

From Kay Haviland Freilich, CG, "Was She Really Alice Fling? Righting a Wrong Identity"; published _Quarterly_ 88 (Sept 2000): 225-28: (quoting; some formatting lost in transfer)

Chester County Orphans Court—Minors, Seeds, 1758, Chester County Archives. Emphasis added.
This file also contains an invoice dated 30 October 1758 for “doctoring Richard Seed.” He obviously died before 1785, given that he is not named in the 1785 or 1797 documents.

Doc. 2186, ibid., emphasis added. Chester County’s recorded wills do not include one for Abigail Seeds; and the present writer has found no record of a marriage for her.

From, "Who was the mother of James^2 Paule (1657-1724) of Taunton, Massachusetts?"; published TAG 73 (Oct 1998): 312-15 (quoting; some formatting lost in transfer):

Shurtleff and Pulsifer, Plymouth Colony Records, 3:122. See also Wakefield, "Richmond Family [....] The date and place of their marriage is unknown; it does not appear published in the vital records of Taunton or Newport.

Neither of Hannah Paule's parents is named in the colony's transcript of Hannah's birth record (Shurtelff and Pulsifer, Plymouth Colony Records, 8:69), but that transcript [...]"

... There is no record of further action in this case; perhaps Hannah's removal to Plymouth ...

From, Roger D. Joslyn, "Rebecca, wife of Thomas^1 Josselyn of Hingham and Lancaster, Massachusetts"; published _Register_ 158 (2004):330-40 (quoting; some formatting lost on transfer):

Middlesex County Probate, First Series, 3:238–39; see also Rodgers, Middlesex County Records of Probate [note 4], 626–27. There are no probate papers for this estate.

Ardleigh parish registers, FHL 1,565,698. These two baptisms were discovered by Leslie Mahler of San Jose, California, and sent to Robert C. Anderson, who shared them with the author. Peter C. Nutt also examined a transcript of Ardleigh registers at the Essex Record Office (ERO T/R168/1) but found no other Joslin baptisms and no Joslin burials.

Actually, there are no Jude/Judd wills for persons from Essex parishes surrounding Radwinter in the time period 1400–1720 (F. G. Emmison, Wills at Chelmsford (Essex and East Herefordshire) [1400–1858], 3 vols. [London: The Index Library (The British Record Society, Limited), vols. 78, 79, 84, 1957–69], 1:239, 2:204–05).

AdrianB38 2011-03-13T04:59:21-07:00

Tom - to highlight just one bit from your thoughts: "To say we have searched a set of records and not found anyone named X is a fact that would be attached to an administrative object"

OK, this sounds one good approach. It redraws the process flow diagram (whatever you want to call it) that I had in my mind - my inputs to that were previously:
- evidence records;
- a few control facts like census dates, laws, etc;

And the output was
- the description of the logic successfully applied (i.e. I'm not interested in the logic that didn't work unless it seems useful in killing a prevalent story);
- the conclusion entity (or set of conclusion entities)

To redraw it then with the "admin" objects, means the inputs are now:
- admin entities (such as hypothesis, objectives, searches done, summary of results of searches including statement of "not found for these criteria")
- evidence records;
- a few control facts like census dates, laws, etc;

No change to the outputs.

AdrianB38 2011-03-13T05:07:40-07:00

Gene - your examples are useful. Quite where they should get stored in the database, I haven't really thought through, as in my view of the world, the citation footnotes and the bibliography all get concatenated from bits held in various places. Where those places should be is a thought for another day...

GeneJ 2011-03-13T08:46:04-07:00

" ..examples are useful. Quite where they should get stored in the database, I haven't really thought through, as in my view of the world, the citation footnotes and the bibliography all get concatenated from bits held in various places. Where those places should be is a thought for another day..."

In my database, I have 40,000 full reference notes, each is associated with it's source list/bibilographic entry.

My full reference notes are carefully developed, not "concatenated from bits held in various places."

My source list entries are planned.

My evidence is recorded in full reference notes and source lists whether whether I'm in TMG, FTM for Mac, Reunion, GenBox, etc.

Call it what you want .. sources, citations, footnotes, endnotes ... if and until BetterGEDCOM addresses these needs, these requirements, then it isn't/hasn't begun to address(ed) evidence.

AdrianB38 2011-03-13T09:47:47-07:00

Gene - "until BetterGEDCOM addresses these needs, these requirements, then it hasn't begun to address evidence"

I totally agree. I also know that "citations and sources" is a huge topic that needs its own set of pages and I'm conscious that - one short discussion aside - we've not touched it yet.

I'll have a look at your last post on that other thread....

GeneJ 2011-03-10T00:55:50-08:00

Tom wrote, "I call that line evidence because not only is it an item of information, it is also information that applies directly to my research goal of discovering my ancestors; because it is info that I am going to use to help meet my goals, it is evidence. Is there any disagreement about this; if so what is that disagreement?"

The line itself, I would call an "information statement." Mills wrote, "Evidence, on the other hand, is our interpretation and use of an information statement."

(1) After a record is found, even when I've determined it's relevant to my family, research continues. I want to confirm my identify is correct and develop an understanding of what the information means. This part of the process depends on the new information and the existing body of evidence. I may have to pursue other sources, too. Sometimes new information just fits like a glove into an existing body of evidence. Sometimes it doesn’t.

Is there anything particular about this source we should consider--torn pages, missing page numbers. Was information arranged both alphabetically and by street? What about that information entry itself? Do you have an ancestor, "Daniel C. Wetmore," who is known to have been alive at 1886, or did Aunt Sally say he died in 1880 and he's not been located in the census that year? Will this be the latest known record you have about dear old' Daniel? Did you otherwise know his middle initial, or do you know his middle initial was J? Did he live at Norwich, or do you last have him as a bachelor at Salem? Is he known to have been a carpenter or a fisherman? Is there evidence of the foregoing already entered into your database in good form? If you hadn't explained (providing comment that was interesting in its own right to me), I'd have asked if you knew what "N L T" means and had you recorded a reference for that knowledge? Have you previously located deeds and know that he owned a home at "NLT?" Any knowledge that he built the home there? Anything significant in Norwich about at this time, or carpenters in Norwich, that might be of note? Neighbors? Were other family members living at Norwich at the time (maybe family members were neighbors; maybe his in-laws were neighbors)? How does knowing the 1886 city directory information impact on your understanding of Daniel? His family? Of his parents or siblings? (If you believed Daniel did live there in 1886, but checked the directory and didn't find him, you might feel that is negative evidence. Ditto, if you didn't find a parent or sibling in Norwich on 5th, that might be negative evidence.)

(2) A plan to use the interpreted information.

(a) If your file is organized along biographical lines, this city directory might support an existing assertion that Daniel was a carpenter, or that he lived at Norwich along "NLT." Depending on the circumstance of your research, the item might also support that his middle initial was C. Perhaps it could lead to a new biographical tag about Daniel, too. Maybe the information represents a conflict with other results--his middle initial was J, he was a fisherman, he died in 1880 or lived in California.

(b) If your file tends to emphasize or be organized more like a research report, then you might create a tag "City Directory" and use the information statement to support same. [1]

(3) Based on how I interpret and plan to use the information, I finalize the reference note(s) and source list entry, and enter/finish entering all into the file. In the examples below, I used the city directory for Elisha M. Bevins, as quickie hypothetical presentations I might make.

(a) In my working citations, I include information snippets. In a more final form (a biography, for example), the citations might appear a little differently.

Pretty standard city directory citation in my file:
"Massachusetts City Directories," Salem and Beverly City Directory, 1886, p. 99 (Salem), entry for Elisha M. Bevins, fish dealer, 6 Washington sq, house at Beverly; digital images, _Ancestry.com_ (http://www.ancestry.com : accessed 16 October 2006).

Hypothetical variations for me:
"Massachusetts City Directories," Salem and Beverly City Directory, 1886, p. 99 (Salem), entry for Elisha M. Bevins, fish dealer, 6 Washington sq, house at Beverly; digital images, _Ancestry.com_ (http://www.ancestry.com : accessed 16 October 2006); his son Elisha M. Bevins, Jr. also listed, also a fish dealer with same locations; ad on page 103 for Bevins & Bevins at 6 Washington, no further information.

"Massachusetts City Directories," Salem and Beverly City Directory, 1886, p. 99 (Salem), entry for Elisha M. Bevins, fish dealer, 6 Washington sq, house at Beverly; digital images, _Ancestry.com_ (http://www.ancestry.com : accessed 16 October 2006). Separately noted, "Salem Vital Records, 1849-1910" report the death of Elisha M. Bevins 15 November 1885; 1900 U.S. Census reports his widow was still residing at 6 Washington ...

(b) I consider how the source will appear in the source list. For example, if I had volumes of entries from the Ancestry.com collection, "Massachusetts City Directories," I might think it's best to have the sources listed at that collection level. (In the case of census, for examples, my entries are now are organized/listed at the county jurisdiction.) Maybe I only have a couple directories,from different areas, so I just list them separately.

[1] The "information statement" in the city directory is short if extensive (a pension file, for example), only a relative clip might appear in my working file. In that case the clip for the "occupation" might be different than for residence.

ttwetmore 2011-03-10T04:44:17-08:00

GeneJ,

You say, "The line itself, I would call an "information statement." Mills wrote, "Evidence, on the other hand, is our interpretation and use of an information statement."

Thanks, that's great to hear. I would still call the line evidence, because I have analyzed it in my mind and decided that it will probably help me meet my goals. But calling it an information statement makes great sense too. In keeping with the ESM defintion, however, my Evidence Person record is definitely Evidence because it is my interpretation and use of an information statement. Excellent. That was the question most important to me. I don't really have to call the line in the source anything, because it never gets realized as an object in the model. But I do have to call the record something because it does become a real thing. The ESM definitions states unequivocally that what I call an Evidence Person is evidence, so that forms the lynch pin concept.

I read the rest of what you wrote with great interest. You are describing in detail how you think about things, what things you consider, and so on, as you are gathering and using information. I do all these things also. They break into two kinds of activities in my mind.

1. Deciding what the info statment means so I can decide whether it really is evidence for my goals; if so I create an Evidence record from it.

2. Using that evidence, and all the other evidence I have gathered, to help me reason about my goals and make my decisions; this leads to new Conclusion records.

I think everything you wrote fits into one of these two categories. I don't record in my data all the thinking behind 1) becasue in most cases for me that is just overkill. Sometimes I do record notes when I think it's not obvious. However, 2) is the research process, and that gets fully documented in the conclusion records that get created.

I don't see anything that you wrote that is contrary to the research process as I understand it, or anything that cannot be supported by the E & C Model as I conceive of it.

ttwetmore 2011-03-10T05:22:23-08:00

Louis,

You wrote, "I don't really like your mixing of evidence persons with conclusion persons under the INDI record. I prefer keeping the evidence people under the EVID record."

Yes, I've known that for a long time. And that is why we don't agree. We are not saying the same thing in our models, and we are not converging. It's a shame.

I basically insist that there has to be just one record type for both evidence and conclusion persons, because I know that the research process is not a two level system. There isn't just evidence and just conclusions. Evidence is the raw data of reasoning. From that evidence you make your first level of conclusions. From those conclusions you might make higher conclusions, and so on. In most cases there will only be two levels, yes, but any person you are researching seriously, doing original research on, will end up being a far more complex case and your model will break down. For the Daniel (and other) Wetmore's of south eastern Connecticut in the 19th century I now have many hundreds of Evidence Records taken from vital records, land records, census records, immigration records, city directories, and so forth. There are many intermediate conclusions that I have had to make about those those records as I have slowly come to fully understand the real human beings involved. Using your model for this case would have been impossible for me.

Here is the real bottom line and why I can't agree with your approach:

In my approach there is one record type instead of the two in yours. Not only does this lead to a simpler model than yours, but not only can my model handle all the situations that yours can, it can go much futher and handle all the multi-tiered cases that yours cannot.

Note that in a person record in my model you don't ever have to state that a person record is evidence or conclusion. That fact is inherent in how that record is sourced. The distinction is so simple that the software application can decide for you. All it has to do is look at the source.

Genealogy is sloppy, sloppy, sloppy. The data you get has sources that range across an unbelievably vast range of levels, qualities and so forth. Your model gets caught by this complexity because you have to make continual decisions about what consititues evidene and what doesn't. Yeah, when you look at a city directory this is obvious. But when you download a GEDCOM file it becomes almost impossibly complex. My model rides on top of that complexity. Everything is a person record. The trees of person records get built up as you make your decisions. Each new person "node" in the tree gets its proper justification. This whole network has a consistency and integrity that you can cast over any difficult situation or case you come up with.

I wonder why you are so committed to a two entity model that limits your capabilities. You are a rational being, so I assume you think your model handles all the situations in a better way than mine does. I simply can't see that. But I accept the possibility that I am so fixated on my approach that I am blind to something else, and that there may be a wonderful "aha" moment awaiting me in the future. I hope so, because I love "aha" moments.

GeneJ 2011-03-10T07:02:47-08:00

Errr... (should have included) might indirectly support a larger group of evidence that his middle name was Cott or Charles or Clark.

louiskessler 2011-03-10T07:41:14-08:00

Tom,

"Your model gets caught by this complexity because you have to make continual decisions about what constitues evidence and what doesn't."

I don't see that at all. My model is simple evidence-based. You get any source, and then you extract all the evidence from it. I don't need an extra evidence record if that's what you don't like. All the evidence derived from a source could be put all under the SOUR record.

"I wonder why you are so committed to a two entity model that limits your capabilities."

I see such similarities between our methods that I don't see a difference in capabilities. So I guess I'm missing something. I just find evidence persons and conclusion persons a very confusing concept. I don't want multiple INDI records in my file if they are the same person (one as a conclusion and one or more - possibly a lot as evidence). If I export information, I want to export just the conclusion, and refer to the evidence. And if as a programmer I find it confusing, I'm sure many users of genealogy programs will as well.

So all I've really done is I've turned your evidence people into my evidence records. Is that really that different?

Doing so allow all the evidence derived from a single source to be grouped together logically, and that's how I want to organize and display it in my program.

Louis

Louis

ttwetmore 2011-03-10T10:12:54-08:00

Louis,

You said, " I don't want multiple INDI records in my file if they are the same person (one as a conclusion and one or more - possibly a lot as evidence). If I export information, I want to export just the conclusion, and refer to the evidence."

Whooah! That statement floors me completely. This may be the crux of the difference. Are you saying that for every person (meaning real human being) in your database you only want one person record? You want all the information from all the evidence to accumulate inside one summary/conclusion person record for each human being? You don't want to keep your evidence as separate, permanent records?

In my city directory example, say I have found 150 mentions of Wetmores in Norwich, Connecticut, for the years 1859 to 1899. And let's say that through my brilliant cognition, I have concluded that those 150 mentions boil down to 18 different real human beings.

In this example, my database would have AT LEAST 168 person records, 150 for the evidence-level persons and 18 for the final/top conclusion persons for the real human beings. But there might be MORE if I had decided to group some of the evidence records together before the final decisions were made that led to the last 18. There would also be a separate source record for each individual city directory.

How many person and/or evidence records do you want in your database for this example? If you say 18, then I content that you are not using an Evidence an Conclusion Process because you destroy your Evidence as you accumulate conclusions -- you are on a one way road without the ability to move backwards or fix mistakes. If you say exactly 168 records (150 evidence records and 18 person records) then you are using the strict, two-tier approach, which is what I assumed you were doing in my last post. If you do say 168 then everything I said in my last post to you still stands, and I don't think you are seeing the hierarchical nature of the research process that the E & P Model is designed to support. If you say 168 or more then I would ask how you represent the intermediate conclusions.

You say you don't want to export evidence. Again I am shocked. I hope this means we have different definitions for evidence. If I can't get the evidence when I import data from somewhere, I don't want that data as it will junk up my databases with unsubstantiated information.

louiskessler 2011-03-10T12:12:35-08:00

Tom,

Yes. 18 INDIs, and the 150 evidence-level persons would either be in the SOUR or EVID records (depending on implementation).

And all this information would get exported. It's just that the evidence individuals get exported embedded in the SOUR or EVID.

You and I agreed that it's the same thing with a different viewpoint back in our Boo Boo Bear discussion:
http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138 where you concluded near the end of that thread with:

"At this point I think we understand each other on this point, see that we both have the same underlying model in mind, and see how easy they map."

We're just rehashing the same thing over again, so we really needn't go on.

Louis

ttwetmore 2011-03-10T15:57:01-08:00

Louis,

You were right to remind me. Sorry.

So in conclusion you promote a two entity, two-tier approach to evidence and conclusions.

I promote a one entity, n-tier approach to evidence and conclusion.

'Nuff said (for now anyhow!)

TW

AdrianB38 2011-03-11T14:55:24-08:00

I've not been able to follow all threads this week so you'll forgive me I hope if my question is dealt with elsewhere but this seems as good a place as any to ask it...

How could we organise the storage of statements of negative evidence that I am coming to believe are as important as the positive / direct evidence?

In case it's not clear, let me give an example: suppose I have a baptism of a person of name X, in year Y, and place Z. I also have, some time later in a census, details of another person of name X, approximate year of birth Y and place Z. Somewhere we'll say that I store the details of my first X and, separately, my second X. Storing the direct details seems obvious - name, birth-date, birth-place (sort of...). But to allow me to match those 2 and create a conclusion, I also need to record that
(a) there is no other person of that name, birthplace and birth-year (roughly) in church or registration records;
(b) there is no other person of that name, birthplace and birth-year (roughly) in equivalent censuses (just in case someone else was born who is not baptised and not registered, and who survives to the censuses)
(c) there is no other person of that name, birthplace and birth-year (roughly) who dies between their birth and the censuses (just in case someone else was born who is not baptised and not registered and survives to the censuses while the baptised child, who I would otherwise imagine to be the one in the census, dies before the census, leaving the unbaptised / unregistered child to survive to the census.)

Hope all that makes sense...

The (c) bit is particularly interesting to me since it refers to entirely negative evidence - i.e. in _my_ conventional GEDCOM view of the world, I wouldn't have had a Source record for this as all my sources consist of direct evidence.

GeneJ 2011-03-12T09:56:45-08:00

Perhaps also that no other family by that surname was found recorded in deeds, church and/or civil, cemetery and/or probate ...

Sometimes far more specific. Obituary of XXX mentions the names of three surviving children, no son "John" is mentioned. Will of ABC ....

Part of the problem with capturing negative or even indirect evidence, maybe the bigger part, is that you often don't know enough to spot it at the time you made your original entries from the source. That's where recording extracts and notes into a research log is a big help.

gthorud 2011-03-12T11:36:21-08:00

Adrian,

Tom may correct me, but if I have understood the E&C model, it is a question if it can handle the recording of the death event that proves that there are two different persons – in the context of the sencus evidence person.

You will create a new conclusion person for the dead person, possibly breaking up a previously recorded conclusion person.

Then you have perhaps have three options:

Do nothing more,

Or, do you record the death event evidence person twice, based on one source, and combine it with the already recorded birth-sencus conclusion person previously recorded, to form a new conclusion person – the census person – based on three evidence and one conclusion person – in addition to creating persons for the birth-death person.

Or, do you record evidence persons for all three events and have one conclusion person ending up with the census person?

I have been looking at the requirement for administrative data. The Gentech model has an entity called Objective, representing a problem/question you want to solve/get answered. The fact that you did the lookups described in a) b) and c) is modeled as Searches in this model, each search linked to an objective. The result can eg. be recorded in the search entity. The Gentech model may be under-developed in this area, or I have missed something, but at least in the diagram, there seem to be no links between the Objective entity and Assertions or other conclusion entities.

The same under-development seems to be the case with integration of the E&C model and the administrative info describing a research process, it’s findings and evidence evaluation.

ttwetmore 2011-03-12T12:15:16-08:00

Adrian,

In your example, I don't see why you need to record a, b or c. Can you enlighten us why you think that? Why wouldn't you just join your two evidence records in to a conclusion and be done?

GeneJ 2011-03-05T09:12:44-08:00

In the last Developer Meeting, we talked about Ancestry Insider's "Genealogical Maturity Model."

See http://ancestryinsider.blogspot.com/search?q=GMM

He has 5 evolutions in that discussion.

GeneJ 2011-03-09T10:42:45-08:00

One might say, "it all depends on what you mean by the 'Evidence and Conclusion Process," no?

As best I am able to determine, Tom, the Model represents your "Evidence and Conclusion Process."

I'm hoping you'll discuss here more not about the "Model" but the "Evidence and Conclusion Process" itself.

We started to discuss some of this in the definition of Individual. http://bettergedcom.wikispaces.com/message/view/Glossary+Of+Terms/35106830

Hoping to warm up this thread, from that discussion:

We might differ on what the evidence-conclusion process is, I don't know.

* There is information ABOUT a source, and information IN the source. The information IN the source (if you will, the "data") is not more important than the information ABOUT the source. To support the Evidence-Conclusion process, you have to get that balance right. I don't think the Model has that balance yet--the Model emphasizes data IN the source. (Is this concept is a little US centric, because of the array of jurisdictions, source types, etc.?)
* The sourcing process (understanding and recording information ABOUT the source) is too misunderstood, too bulky, too frustrating and error prone, too user-to-user unfriendly, and it's traditionally been poorly supported by the mega-sites. Indeed, this part of the process is skipped or otherwise short changed by many users. If we were working to support the process, then before we worked with "data," I think we'd deal with the sourcing process.

I don't find the Evidence Explained source-citation system complicated--but making it work in today's genealogical software is painful!! To make it work in both software AND GEDCOM might alone be worth certification.

Getting sources (ala, master source and full reference note type entries) entered in good form to the database real time during the research process seems the NUMBER ONE stumbling block to the evidence-conclusion process. [...]

Separately then, there is a difference between information and "evidence." The latter is supposed to be relevant to the problem. It can be direct, indirect or negative. In the Model, is there a step before the data is entered in which the information is determined to qualify as evidence? Where is the step that identifies indirect evidence? Negative evidence?

Aside from qualifying as relevant, then where is the step when evidence is compared and contrasted with all the other known information about the person, the family, the town, the times, etc. in order to learn synergies and find possible conflicts?

Since we are talking about BetterGEDCOM, when sources and these Model steps get recorded and shared, won't other ppl be zapping them into their files? Does that mean somehow "your steps" become recorded as "their steps." Do your sources become recorded as their sources?

ttwetmore 2011-03-09T12:40:15-08:00

GenJ's comments are in <<>>'s:

<<One might say, "it all depends on what you mean by the 'Evidence and Conclusion Process," no?>>

When I say the Evidence and Conclusion Process I am talking about the standard research process described in text books and taught at genealogical conferences: set your goals, plan your search, collect your evidence, reason about the evidence, make and "publish" your conclusions; repeat at any point as needed. I call it E & C P to stress the fact that there is both Evidence and Conclusions required in application programs that support the process. This is not to be contrary to the conventional name of it, but to try to make the more geeky, software-centered genealogists, who are naive about professional scholarly genealogy, to have a catch phrase they can hang onto. I've been at this proselytizing for a paradigm shift in models to support evidence and conclusions for a long time (17+ years), and I have gotten comfortable with a few catch phrases I have found useful around geeks. Be assured -- there is nothing in the E & C process that contradicts or changes the standard process.

<<As best I am able to determine, Tom, the Model represents your "Evidence and Conclusion Process.">>

I have been sloppy with terminology. The BG Model must be able to support the full research process. We first need a model that can support today's "normal" conclusion-only based applications, but, hiding in the wings, that BG Model must have built into it the potential to be be the model for applications that support the full research process.

Let me be as precise as I can. Let's say an application fully supports the future BG Model by representing the BG entiies and relationships directly in its database. Obviously the research process is not that database, so I'm definitley not saying that the research process is the model. However, I am saying that the current "state" of where the researcher is along the steps of following that research process, MUST BE captured exactly by the state of the entities and the relationships in the database. So every time a user carries out a step in the research process (adds a repository, adds a source, adds evidence extracted from a source, makes a conclusion about the evidence), there is a precisely undertood and exactly corresponding change that takes place in the database. The steps the researcher takes and the state of the database change in lock step with one another. The current state of research is always reflected in the current state of the database and vice versa.

I believe it is a MANDATORY goal of the Better GEDCOM effort to create a data model that makes the design of such databases possible. I use the term Evidence and Conclusion Model to refer to the models that have this capability.

It was the purpose of the long message that started this thread to explain in some detail how an Evidence and Conclusion Model that allows Person and Event Records to be structured into trees, is exactly suited for this task. In that note I demonstrated how every step a researcher using the normal research process would take, is reflected by an exact and corresponding change in the underlying model. When a researcher adds evidence, a new Evidence record is added. When a researcher makes a conclusion, a new Conclusion record is added that binds the records that provide the evidence for the conclusion together with a new record holding the proof-statement that justifies the conclusion. In the model, every conclusion step is implemented by the formation of a new conclusion object in the database. At any given time, the state of the database reflects exactly the evidence the researcher has gathered and the conclusions he/she has made, at that same exact time.

<<I'm hoping you'll discuss here more not about the "Model" but the "Evidence and Conclusion Process" itself.>>

I hope at this point you have reread the original post, and the paragraphs above, and understand how I view the process and how the model and the process interact with one or the other. The easiest geeky way to say it is that the state of the model/database holds the research state. But of course, if you have ever worked on complex software, you already know that is this one characteristic that is common to all good software systems.

<<We might differ on what the evidence-conclusion process is, I don't know.>>

I don't think we do. I think it is the fact that I call it the Evidence and Conclusion Process that gets in the way.

<<I don't find the Evidence Explained source-citation system complicated--but making it work in today's genealogical software is painful!! To make it work in both software AND GEDCOM might alone be worth certification.>>

I agree, it's conceptually simple. I think many models try to implement it, but that the applications don't provide the user interface capabilities that make using it easy.

<<Getting sources (ala, master source and full reference note type entries) entered in good form to the database real time during the research process seems the NUMBER ONE stumbling block to the evidence-conclusion process. [...]>>

I agree. It is simply too easy to be lazy; not really lazy, but too easy to jump into the juicy data without taking care of your administrative chores.

<<Separately then, there is a difference between information and "evidence." The latter is supposed to be relevant to the problem. It can be direct, indirect or negative. In the Model, is there a step before the data is entered in which the information is determined to qualify as evidence? Where is the step that identifies indirect evidence? Negative evidence?>>

Good question. Here is my view. A researcher generally wants to record every "thing" that he/she thinks might be relevant later when it's time to reason about things and make conclusions. He/she might skip certain things that seem unimportant at the time, of course, that come back to haunt. So I look at everthing that a researcher chooses to add to the database as evidence. It might not turn into evidence that is ever used in making an important decision or conclusion, but that can't be known up front. To answer your question, I think there is such a step that happens before the data is entered into the database, and that step is simply the researcher deciding what to record and what not to record. I haven't worried about where and when evidence gets tagged as direct or indirect or conflicting. I simply imagine that these determintations don't really apply to the evidence but to how the evidence is used to make a conclusion. I'm assuming that it will be detail of the model to decide how this tagging gets represented.

<<Aside from qualifying as relevant, then where is the step when evidence is compared and contrasted with all the other known information about the person, the family, the town, the times, etc. in order to learn synergies and find possible conflicts?>>

This is the reasoning step. It may be awhile before we can expect our applications to do our resoning for us, so for the time being, this is a step that the researher must do on his/her own. However, the application can provide some nice support for this, by making it easy for the reasercher to gather, see and "move around" and "drill down" into all the current evidence and conclusion records that are germain to his/her current deliberations. During the reasoning step the researcher is not making any changes to the database, so there are no changes in the model that directly reflect the reasoning process. When the researcher is reasoning, he/she is in the process of deciding what changes to make in the model next. Of course the application itself can and should keep track of the groups of records that the researcher is currently deliberating about, so that the group can be quickly accessed whenever the researcher wants to return to working on the research goal that those records represent. It would be very easy to add the idea of a an "arbitrary group of persons" to the BG Model if we wanted formal model support for this (in fact, many of our models do talk about a Group entity that can serve this purpose).

<<Since we are talking about BetterGEDCOM, when sources and these Model steps get recorded and shared, won't other ppl be zapping them into their files? Does that mean somehow "your steps" become recorded as "their steps." Do your sources become recorded as their sources?>>

At any given time the state of a BG database reflects the current state of research. When that database is transported to another application or written to a file, that current state of research is what is transported or stored. The receiver doesn't "see" the steps, he/she just sees the collective whole of all the evidence that has been collected and all the conclusions that have been made.

I have two things to say about that. First, is that in most cases that is probably what one wants to have happen. Second, if that is not what the exporting user wants to happen, then an application should/could easily implement a customizable export feature that can filter or otherwise limit the the kinds of information that gets exported.

As to your question about whether "my sources become your sources" I would say, yes if you want them to. In our current thinking, sources are "just" another type of BG entity and record so it has no special properties.

But, but, but, but this has very serious implications does it not? The same serious implications that exporting place records has. I won't go down that path too far right now, as it is a big topic that deserves lots of attention. But consider this. If an exported BG file contains the place records from the exporting program, what happens when those place records CRASH INTO the place records that are already in the importing application's database? This is a very biggie. The problem of sharing sources is similar. The place problem can be "solved" by appealing to a standard place hierarchy provided by standards or an agreement to use one set of places in all applications, but what is the chance that that would happen in our lifetimes? The source problem could be partially solved by deciding on a standard source for source descriptions -- does the Library of Congress have such a standard database for all the items that it indexes? If so, we should use that as our standard source database.

GeneJ 2011-03-09T13:59:01-08:00

Maybe in small pieces with sharp pencils.

You write, "set your goals ... plan your search, collect your evidence, reason about the evidence..."

To me, that "stuff" in the sources is just information (just data .. like an "exhibit") to me. It's not "evidence" until it's been sort of reasoned and seasoned.

It rubs me to hear "bits" stripped from that information referred to as "evidence" or "actual evidence."

Moreover, when those "bits" are stripped off, it seems more like information taken out of context--meaning it is even less valuable to me as information.

Because the presentation of those 'bits' is made separate from the full reference note, the bits seem further degraded.

How do we reconcile this? --GJ

P.S. When I say the Evidence and Conclusion Process I am talking about the standard research process described in text books and taught at genealogical conferences.

Any particular work, lecture or class out there that guides your thinking?

ttwetmore 2011-03-09T18:51:00-08:00

Here is my model and belief of where evidence comes from and how it fits into the model. I will use a very simple example and point out where I think there are possibilities for disagreement.

Example, on page 96 of the 1886 Norwich, Connecticut, City Directory, that I found in the New London, Connecticut, Public Library, there is an entry that says,

"Daniel C. Wetmore, carpenter, N L T"

Here's what I think we have:

1. A repository, the New London Public Library; we create a repository record and add it to our BG compliant database.
2. A source, the 1886 Norwich, Connecticut, City Directory; we create a source record and add it to the database and we make it refer to the repository record.
3. Lots of information on most pages of the directory, including one (at least) item of information that is of great importance for us, the line quoted above. I call that line evidence because not only is it an item of information, it is also information that applies directly to my research goal of discovering my ancestors; because it is info that I am going to use to help meet my goals, it is evidence. Is there any disagreement about this; if so what is that disagreement?
4. We next create a person record from that line; this is what I call an evidence person. I've said evidence person and conclusion person maybe a hundred times so far on this wiki, and now we are where the rubber hits the road. This person record that I am talking about here, is exactly what I mean by an evidence person, and this is one of the main ways that evidence records gets into our databases. It is evidence packaged into a computer representation; note that the record cannot ever change unless you are correcting an error inserted at its creation. If we were following an E&C Process (the normal research process), this evidence record is permanent. It can be used to make conclusion, but using it in this way never changes it. If I were using my LifeLines program to create this evidence record, here is exactly what that record would contain:

0 @I1@ INDI
1 NAME Daniel C /Wetmore/ <<-- I know where the surname is so I add the slashes; you might argue against this on purist grounds.
1 SEX M <<-- I add sex even though it's not stated in the evidence; this could be argued against on purist grounds; but I know women are mentioned in this directory only if they are widows or have a profession.
1 RESI <<-- I turn city directory entries into residence events; that is, I don't think we need "city directory" events; you could argue about this also.
2 DATE 1886 <<-- The date of the city directory is not always the date that is truly accurate for the entry (the data might have been collected the year before), but what are you trying to do, solve world hunger?
2 PLAC Norwich, New London County, Connecticut, United States <<-- I know the details of the geographic hierarchy, so I add them; could be argued against on purist grounds.
3 ADDR New London Turnpike <<-- I know that the abbreviate N L T is used in this directory to mean New London Turnpike; could be argued against on purist grounds.
1 OCCU Carpenter <<-- I capitalized it; yikes; could be argued against on purist grouds.
1 SOUR @S1@ <<-- We have to link to the record that represents the source to show where this evidence came from; whew, one with no need to argue about.
2 PAGE 96 <<-- If we want to provide exact detail of where the evidence came from from within the source, we add it here.

For me this is how evidence should enter a genealogical application. You could argue with me some about this because I have to do some "processing" of the info that was found on the line, to get it into shape for putting it in the record. Some might argue that this processing makes this record something more than just evidence, because we have applied some brain power to it. If you argue that, I will agree with you wholeheartedly. However I will still call this record evidence, because I believe it is the best we will ever do. Not only do I think it is the best we will every do; I think that we loose nothing by doing this; if done right is misses no information that it should have, and it adds no information that it shouldn't have.

Some will argue that we need to create a whole other record, let's call it just an evidence record (say we have an entity in the model named evidence. In that record we could put as exact a transcription of the line in the directory that we can. The goal here is to create an evidence record that does not interpret what the evidence means in any way. In this view we have records in a sequence like this -- repository, source, evidence, person -- I'd still call the person record an evidence person since it is now based directly on an evidence record. I think this is very good argument also, and I've held this exact view myself many times over the past 15 years. No one every accuses me of being consistent.

I have bared my soul. Using an oft-repeated quote, "This I believe." If others believe other things about where evidence comes from and how it gets into our databases and what form it takes once it gets there, some of us would like to hear what it is so we can compare notes.

And by the way, there are other ways that evidence enters a database in my opinion. I tried to pick an absolutely trivial example to get started. I could write ten times more about what I believe is the proper way to treat a census record, and what form the evidence records that are from that kind of evidence should be like.

louiskessler 2011-03-09T20:56:54-08:00

Tom,

I personally see a problem in placing the SOUR tag at level 1. To me it indicates that just "the individual" is attributed to that source. To attribute the NAME, SEX, RESI and OCCU to it, you'll have to place a level 2 SOUR tag under those.

My reasoning here is what if another bit of evidence came along that added additional info to the individual. You wouldn't be able to tell what bits were from which source.

I believe we may have argued this in the past, with you taking the view that the SOUR should be at the highest levels, and me believing it should be either at the lowest levels, or at least at level 2 so that event sources can be attributed to the correct evidence.

The alternative would be to have evidence records, and place all the information there, and not have the sourcing in the INDI records at all, e.g.:

0 @E12@ EVID
1 INDI
2 UID 27ACE363CA3ED711900FF9FD6B6752460891
2 NAME Daniel C /Wetmore/
2 SEX M
2 RESI
3 DATE 1886
3 PLAC Norwich, New London County, Connecticut, United States
4 ADDR New London Turnpike
2 OCCU Carpenter
1 SOUR @S1@
2 PAGE 96

Then, the INDI can be left just for the conclusions, e.g.:

0 @I1@ INDI
1 EVID @E12@
2 UID 27ACE363CA3ED711900FF9FD6B6752460891
1 NAME
2 CONCL I know where the surname is so I add the slashes; you might argue against this on purist grounds.
1 SEX
2 CONCL I add sex even though it's not stated in the evidence; this could be argued against on purist grounds; but I know women are mentioned in this directory only if they are widows or have a profession.
1 RESI
2 CONCL I turn city directory entries into residence events; that is, I don't think we need "city directory" events; you could argue about this also.
2 DATE
3 CONCL The date of the city directory is not always the date that is truly accurate for the entry (the data might have been collected the year before), but what are you trying to do, solve world hunger?
2 PLAC
3 CONCL I know the details of the geographic hierarchy, so I add them; could be argued against on purist grounds.
3 ADDR
4 CONCL I know that the abbreviate N L T is used in this directory to mean New London Turnpike; could be argued against on purist grounds.
1 OCCU I capitalized it; yikes; could be argued against on purist grouds.

The UID, EVID and CONCL tags of course do not currently exist in GEDCOM.

So the Conclusions are included with the tags in the Conclusion People. The data is included with the Evidence. The genealogy program should have no problem putting this together.

The beautiful thing here is that if the data is structured this way, then repositories can describe ALL their records using only SOUR and EVID records. Each repository can put up a database of their information that can be searched through. You can search for all records with name similar to Daniel Wetmore and occupation similar to carpenter. Then using the evidence records that you think (or don't think) refer to your conclusion persons, you can link them in very simply and update your conclusions on each event for that person.

Hmmm. I went a little further than I thought I would in my comment. And I don't think what I wrote is perfect, but it is another possible model, and there are probably many other models possible. But this one will likely be similar to what I will eventually implement.

Louis

ttwetmore 2011-03-09T22:18:05-08:00

Louis,

We think differently. I don't like your approach. You have two "parallel" records to describe the same single item of evidence, with a separate conclusion tag created for every single field.

We apparently have very different views on what a conclusion is. You are using the word conclusion to mean a description on how a single field of information from a single item of evidence was interpreted. Personally I think that information is almost never needed, and if it is, it can be accommodated within a single evidence record. My definition of conclusion (from the point of view of the genealogical process of course) is a substantial decision made by a user that evidence that came from two or more different sources refers to the same person.

ttwetmore 2011-03-09T22:30:20-08:00

Let me try to extend the city directory example so there is a conclusion in the mix.

In the first example, we had the following line from the 1886 Norwich city directory: "Daniel C. Wetmore, carpenter, N L T"

Say we checked the directory two years later and found "Daniel V. C. Wetmore, shipwright, bds Thames"

This leads to two evidence persons:

0 @I1@ INDI
1 NAME Daniel C /Wetmore/
1 SEX M
1 RESI
2 DATE 1886
2 PLAC Norwich, New London County, Connecticut, United States
3 ADDR New London Turnpike
1 OCCU Carpenter
1 SOUR @S1@

0 @I2@ INDI
1 NAME Daniel V C /Wetmore/
1 SEX M
1 RESI
2 DATE 1888
2 PLAC Norwich, New London County, Connecticut, United States
3 ADDR Thames Avenue
1 OCCU Shipwright
1 SOUR @S2@

After the reasoning step let's say we decide that these two evidence persons refer to the same human being. We would then create a conclusion person record:

0 @I3@ INDI
1 INDI @I1@ <<-- This is the only tag needed to be added to GEDCOM to support this idea.
1 INDI @I2@
1 SOUR I have analyzed all the Wetmores from the Norwich city directories, and have concluded that these two evidence persons refer to the same human being.

In this case the conclusion person doesn't need any fields of data at all, as all that is inherited from the sub-Persons. I'll construct another model where this is not the case.

ttwetmore 2011-03-09T23:03:18-08:00

Here is a slightly more complex example of a two evidence and one conclusion example.

Let’s say we have a birth record that gives the exact birth date of a person, but the birth place is just at the province level. Let's say the birth record is this item from some source;

"Daniel Van Cott Wetmore, born 13 September 1791, New Brunswick."

Say we find another birth record in some other source, with a shorter name, with less and conflicting info about the date , and more specific info about the place; let’s say that evidence looks like:

"Daniel Cott Wetmore, born 1792, Saint John, New Brunswick."

Let’s create the two evidence person records:

0 @I1@ INDI
1 NAME Daniel Van Cott /Wetmore/
1 SEX M
1 BIRT
2 DATE 13 Spetember 1791
2 PLAC New Brunswick, Canada
1 SOUR @S1@ <<-- points to the first source

The lack of any 1 INDI line and the 1 SOUR line points off to a real source indicates that this record (and the next) are evidence records.

0 @I2@ INDI
1 NAME Daniel Cott /Wetmore/
1 SEX M
1 BIRT
2 DATE 1792
2 PLAC Saint John, New Brunswick, Canada
1 SOUR @S2@

We reason about these two records and decide they refer to the same human being. There is conflict in the names, in the birth years, and there is more detailed information about the same fields in the different records. In this case we might create the Conclusion Person as follows:

0 @I3@ INDI
1 NAME Daniel Van Cott /Wetmore/ <<-- Selects the fuller name
1 BIRT
2 DATE 13 September 1791 <<-- Selects the more detailed date
2 PLAC Saint John, New Brunswick, Canada <<-- Selects the more detailed place (from different evidence as the date)
1 INDI @I1@
1 INDI @I2@
1 SOUR … the proof-statement …

In this case fields are added in the conclusion person that show how the researcher has resolved the conflicts and dealt with the overlapping data. There is no need to add source references to these added top fields, since the proof-statement should cover choice selections, and the evidence persons can be consulted to find the individual source items when needed.

The fact that this third Person record does have 1 INDI lines and a 1 SOUR line with a proof statement, indicates that this person record is a conclusion record. That is, the same person record format can be used for evidence and conclusion persons (and everything in between) and no tag is required to say what kind of person record each one is.

louiskessler 2011-03-10T00:08:52-08:00

"You are using the word conclusion to mean a description on how a single field of information from a single item of evidence was interpreted."

No. Each of my conclusions is a description on how a single field of information from ALL items of evidence was interpreted.

If for example, you had 3 items of evidence from 3 sources for, say, the birth event, then there would be one conclusion that describes which birth date and place is thought to be best amongst the evidence, and why it is better than the others.

louiskessler 2011-03-10T00:28:49-08:00

Tom,

Actually, we are almost saying the same
thing, but just presenting it differently.

I don't really like your mixing of evidence persons with conclusion persons under the INDI record. I prefer keeping the evidence people under the EVID record.

Doing so, I can rewrite your example as this:

0 @E1@ EVID
1 INDI 27ACE363CA3ED711900FF9FD6B6752460891
2 NAME Daniel Van Cott /Wetmore/
2 SEX M
2 BIRT
3 DATE 13 Spetember 1791
3 PLAC New Brunswick, Canada
1 SOUR @S1@ <<-- points to the first source

0 @E2@ EVID
1 INDI 2AACE363CA3ED711900FF9FD6B6752460BC1
2 NAME Daniel Cott /Wetmore/
2 SEX M
2 BIRT
3 DATE 1792
3 PLAC Saint John, New Brunswick, Canada
1 SOUR @S2@

Here, I've put the UID after the INDI tag to save a line and an extra tag. Although this example does not show it, one item of evidence can have many evidence persons, each starting at level 1. It can have places, or families, or groups, or events. Anything that is a level 0 record can be a level 1 entity within an EVID record.

0 @I3@ INDI
1 NAME Daniel Van Cott /Wetmore/ <<-- Selects the fuller name
1 BIRT
2 DATE 13 September 1791 <<-- Selects the more detailed date
2 PLAC Saint John, New Brunswick, Canada <<-- Selects the more detailed place (from different evidence as the date)
1 EVID @E1!27ACE363CA3ED711900FF9FD6B6752460891@
1 EVID @E2!2AACE363CA3ED711900FF9FD6B6752460BC1@
1 SOUR … the proof-statement

where the ! separates the record ID from the substructure ID number, which is a valid (although rarely used) construct in GEDCOM. It does make this a bit cleaner.

Are we converging somewhat?

Louis

ttwetmore 2011-03-05T05:47:16-08:00

How Many Sources Can a Person Record Have?

A question that comes up in Better GEDCOM discussions asks what components of a Record can have their own references to Sources. For example can a Person’s name or sex have their own source? Can the date of a birth event have its own source? Can a note have a source? An so on.

In today’s genealogical applications a Person record may contain information from many sources. This is because the records are built up by the user over time by either merging other records together, or by adding new information from newly discovered sources. Therefore each component of a Person records needs to be able to have its own source reference.

But consider the Evidence and Conclusion (E & C) Process and the Person records needed to support it. By definition an Evidence Person record contains all (or at least) only information from a single source. Therefore Evidence Person records require only a single source reference and that reference “covers” the entire record.

If the user then follows the E & C Process in the manner I have advocated, he/she will construct a new Conclusion Person record for every conclusion, and that Conclusion record contains references to the “lower” level Person Records covered by the conclusion. Each of these lower Person records already has its own source reference. The new Conclusion Person record therefore only needs its own, single source reference, and that source is the user's proof statement that justifies making the conclusion.

Therefore, if a genealogical application supports the E & C Process, and only the E & C Process, and supports it as I have outlined it through new conclusions Records, then every Person record always requires only one source reference.

But if the genealogical application also supports today’s methodology, as outlined above, a different source reference could be needed for any component in the record.

Big question – should Better GEDCOM only allow a Person record to have a single source reference, or should Better GEDCOM allow every component of every record to have its own source reference? This question boils down to asking whether Better GEDCOM should insist that applications that support it require its users to follow a strict E & C Process, or should Better GEDCOM allow applications also follow the "merge and burn" practices of today's applications?

ttwetmore 2011-03-05T14:41:53-08:00

TestUser,

A few comments on your comments:

"But a Conclusion Person always has more than one Source - albeit indirectly, via the Evidence Presons it links to."

Exactly, so you don't add those sources to the conclusion person at all, since the CP automatically "inherits" them from the evidence. But you do need to add a source to the conclusion record; that source is the proof statement that desribes the researcher's rationale for making the conclusion. (Gier doesn't like calling this a source [see his message], but it really is; the source is the brain of the researcher, just as much as a census record might the source of an EP).

"If you want to restrict every Person Record to only one "direct" Source, then that should be fine, too. The Software just creates Evidence Persons for every PFACT, so you can give every PFACT a Source. On the Screen, the Person will have many PFACTs with different Sources, just like now. The users won't see a difference."

I was playing the devil's advocate a bit when I was talking about one source per record. I wanted to raise the point that in a pure E&C process that that is all that would be needed. However in the current mentality this is such a radical thought that no one would take it seriously. This is one of the issues when trying to help drive a paradigm shift -- you have to find some way to gently prod people into thinking thoughts that they take so much for granted that they wouldn't ever think about them on their own.

"If you create new CPs all the time, it might get a bit "untidy", at least for a human eye."

Ah, to me this a user interface issue. I think the user interface by default should show "just" the current batch of root level CP's, not the EP's or the intermediate CP's that make up the root level CP's. (Of course, if an EP has not yet been jointed into a CP, it by definition, is already a top level person, so all the "currently naked" EPs would also be shown by default.) After all, at any given point in a user's research these are the persons that the user is thinking about, and the represent the current "frontier" of their research. Of course, the user interface has to allow the user to "drill down" into the contents of the CP's, see see what they are made of, so they can be rearranged if needed. I really think it is job of the user interface to take care of the tidiness issues!

"How about this:
Don't create new Conclusion Persons when you add a conclusion. Keep the CP and just add/change the link(s) for the new PFACT(s). The "reason" or "proof statement" then attaches to the link(s). You would usually only have one CP for every Real Life Person. The PFACTs of this CP would link to any number of Evidence Persons. Old conclusions (=old links) usually don't get deleted, but "demoted". If you want to "merge and burn", then you can delete the old links."

Well, if you are willing to drop the term PFACT and replace it with EP you are right on. Remember, every person record, at least as we are thinking about them right now for this discussion, "contains" an array of lower level Person records that provide the next deeper layer of evidence for the top level one. In the "tree method" you can build up these trees of person records to any depth. What you are suggesting, I believe, is that instead of making a new conclusion record you simply add a new EP to an array under an existing CP. This works fine and is what I believe you should do AS LONG AS THE PROOF STATEMENT FOR THE CP WOULD REMAIN THE SAME. This is the criteiron you have to use when making the decision to either ADD A NEW conclusion person record or ADD TO AN EXISTING conclusion person. I think you need to do both -- you obviously are going to be able to find evidence that supports a previously made conclusion, as you are going to find evidence that leads to new conclusions.

ttwetmore 2011-03-05T14:55:15-08:00

Comments on Gier's comments:

"BG must allow programs that does not support E&C.
Also, as I see it, BG has stated that we are going to be backwards compatible so an E&C program will have to be able to import data not using evidence persons. So there can be no question that BG must support Conclusion records, and they can have several sources (or rather citations."

This is true. As I said in the response to testuser, I am trying to help push a paradigm shift, and the only way to do that is to push people into thinking about ideas they wouldn't normally do. Of course, if BG supports current applications it has to work in a conclusion only world where records can have a zillion sources for every bit of every attribute. But if the BG model will truly also support the E&P process I think it is important to thnk about some of the ramifications of that. One of those ramifications is that every person record, evidence, conclusion, and intermediate, in that model, only needs one source! This is wonderful. I doubt that many people thinking about the E&P process have yet thought about that. I'm just trying to be my usual friendly and pedantic self.

"An observation that may need a separate discussion is: An importing application supporting only conclusions need to do a merging, and the rules for this must be standardized."

Do you think this is a Better GEDCOM issue? I think the merging problem is fascinating. Some of the neatest algorithms I wrote in my old LifeLines program had to do with finding the balance point between what part of merging could be done automatically and what part had to be done with user input. But I have always thought of that as an application issue and not a model issue. Though I suppose with some stretch of the imagination it could be thought of as a model issue.

"A detailed remark: I would not call the "the user's proof statement that justifies making the conclusion" a source, rather a research note."

I agree that calling this a source seems a little awkward. But the important point is that this proof statement has exactly the same role with respect to the conclusion person as a source reference has to an evidence person. One must come to grips with the fact that every record has to be justified somehow (and in conclusion models every component within a record can be separately justified), and this is done by referring to physical sources in evidence records and by referring to proof statements for conclusion records. Call these latter "research notes" if you like, but never forget that they are the source of the conclusion.

gthorud 2011-03-05T16:14:34-08:00

Tom,

We will have to see later if the standard needs to specify how the merging shall be done. If you think that this can be done in several ways, it makes me more certain that it needs to be specified. What I mean by rules is, where the information from one or more evidence/conclusion persons would end up in one conclusion person.

One interesting thing is, how would you merge the above "research notes" so that they will make sense in a conclusion person? Would you need to have a merged research note potentially attached to each and every piece of info (eg. a date in an event), and have each piece of the merged research note sourced by all the sources referred to by it's subordinate evidence/conclusion persons (the context that the note-piece is written in) - in order to make any sense? How will you present those sentences, written in the context of some of the sources and other "sub-research notes" to a user in an application where the user have no idea about E&C is and who will probably read it in one note?

Geir

I see the analogy, but I am afraid the only thing you achieve by calling my "research notes" or "reasoning" for sources is unnecessary confusion.

gthorud 2011-03-05T16:19:16-08:00

My last para should have been #2.

GeneJ 2011-03-05T16:24:20-08:00

The form a proof arguments takes depends on it's complexity.

It's not uncommon for a proof argument to refer to (? linked to) multiple persons and events and, almost always, to multiple sources.

If a proof argument is short, it may well fit into a full reference note. Depending on the nature of the proof argument, one may want to write it into the body of the database (ala, it could be a part of the narrative generated from the genealogy software). When it's written into the body of the database, it's not uncommon at this time for the proof argument to span several tags (so that the user can control where the source references are placed).

Still again, because formatting can be such a challenge or for long, article length materials, some users just create the proof argument in their word processor, and make that a source to the database.

GeneJ 2011-03-05T16:27:23-08:00

errr.. it's complexity and how prominently the user wants the material displayed.

gthorud 2011-03-05T16:35:47-08:00

GeneJ,

If your posting was in response to mine (it was written just a few minutes after)- I am not concerned with the length - I am concerned with the structure and presentation of many proof arguments that are merged automatically - preservation of the context they are written in (unless all that context (inkl. source identification) is written into each argument as text - which may seldom be the case).

Geir

GeneJ 2011-03-05T16:57:11-08:00

Sorry, I should have been clear. I was responding to TestUser's reference to proof argument.

I'm an _Evidence Explained_ person. I hope we'll have more opportunity to review the functionality of the Model. It is hard for me at this time to see how the process I use is supported by the Model.

AdrianB38 2011-03-07T14:22:26-08:00

"It is hard for me at this time to see how the process I use is supported by the Model."

Gene - this is of concern to me, also. (Since I don't possess an EE, I'm assuming that your process is not dissimilar to that implied by the GPS.) I have this feeling that your research and analysis process would not change if you used the E&C Model. What would change is the way that you recorded the results. If your process demonstrates that X was born on DD/MM/YY in State Y, then that process wouldn't have changed. Nor would (presumably) your entry of the sources into a E&C-Model-compliant program.

What would change is that rather than alter your original details for X's birth, you'd create (in some way I can't possibly guess because it would depend on the evidence that you'd found) some evidence-persons for each of your sources, potentially a conclusion person to draw together just that data, and then a _new_ version of X that contained the new version of their birth data, plus all the _other_ details (e.g. marriage, occupations, residences, etc) from the previous version of X, to concoct the new story for X. And you'd leave the previous version of X there in case you need to revert.

I don't think that changes your research process - only the method of storing the data from that process in a genealogy program.

But I'm not quite sure on this yet since the E&C Data Model we have so far is tightly focussed on one area and I've not yet convinced myself that it doesn't need to look further still and explicitly document the logical arguments - or the stuff from your research log. And before you say it - no - I'm not yet convinced we _need_ it, but I have to run it through in my mind.

testuser42 2011-03-07T15:12:31-08:00

Somewhere further up, Tom said in response to me:

...instead of making a new conclusion record you simply add a new EP to an array under an existing CP. This works fine and is what I believe you should do AS LONG AS THE PROOF STATEMENT FOR THE CP WOULD REMAIN THE SAME. This is the criteiron you have to use when making the decision to either ADD A NEW conclusion person record or ADD TO AN EXISTING conclusion person. I think you need to do both -- you obviously are going to be able to find evidence that supports a previously made conclusion, as you are going to find evidence that leads to new conclusions.

Very good. Sounds like a very helpful rule. I think this would reduce the numbers of CPs a lot, but keep CPs from getting loaded with too many EPs.
But, I'd like to see an example for this. What exactly would make you rewrite the "proof statement" for a whole CP? Say, if you find another birth date, I guess it would be enough to add this to the existing CP and maybe write a short argument for the reasons why you included this piece of information.

(man, it would sure be easier if this was a usenet-style discussion where you could follow up on older posts...)

testuser42 2011-03-07T15:36:34-08:00

A bad? analogy about the NEED for support for E+C:

Not everybody needs it. It's like a wheelchair access ramp: it helps some people while others take the stairs. But when the "stairs" people suddenly are pushing a baby stroller, they will be able to use the ramp, too.

gthorud 2011-03-08T19:18:12-08:00

Apropose Usenet, I wonder if all material on this wiki will be around in 10 years?

Andy_Hatchett 2011-03-05T09:56:04-08:00

I'll just say this...

If BetterGEDCOM doesn't allow applications to follow the "merge and burn" practices of today then it forfeits any home of being adopted by those application's developers tomorrow.

ttwetmore 2011-03-05T10:55:23-08:00

The DeadEnds model allows every component of a record to have any number of source references. There is no pain in allowing this. Certainly in an application using a strict evidence and conclusion approach wouldn't need this, but the model could be used by systems the use the merging approach.

The DeadEnds model allows each Person record to refer to an array of other Person records. The intention of this capability is to support the evidence and conclusion process by permitting the construction of conclusion trees of any size. But an application might devise other ways to use this capability. A merge and burn system could certainly use the model and simply ignore this capability.

I guess this is a long way of saying I agree with Andy that the Better GEDCOM model must support today's systems with their merging ways, and must therefore all components to have their own sources.

testuser42 2011-03-05T11:26:04-08:00

Ah, interesting! This is a sensible way of recording "proof statements", I think.

But a Conclusion Person always has more than one Source - albeit indirectly, via the Evidence Presons it links to. Of course, you could add a Source element directly, too, and let it hold the "proof". But this doesn't get in the way of old-style applications, does it?

If you want to restrict every Person Record to only one "direct" Source, then that should be fine, too. The Software just creates Evidence Persons for every PFACT, so you can give every PFACT a Source. On the Screen, the Person will have many PFACTs with different Sources, just like now. The users won't see a difference.

A bit OT, but anyway:
If you create new CPs all the time, it might get a bit "untidy", at least for a human eye. IIRC, GenXML had only one CP - and it might be enough. What about this:

Don't create new Conclusion Persons when you add a conclusion. Keep the CP and just add/change the link(s) for the new PFACT(s). The "reason" or "proof statement" then attaches to the link(s).
You would usually only have one CP for every Real Life Person. The PFACTs of this CP would link to any number of Evidence Persons.
Old conclusions (=old links) usually don't get deleted, but "demoted". If you want to "merge and burn", then you can delete the old links.

testuser42 2011-03-05T11:30:54-08:00

I started composing the last entry before Tom's response to Andy.

So, let's just allow any number of Sources in any Person. It leaves more opportunities.

If you look at the number of Sources a Person has, you see what kind of Person it is. An Evidence Person has one Source only. A CP has more - because every CP links to at least two EPs.

gthorud 2011-03-05T13:53:20-08:00

BG must allow programs that does not support E&C.
Also, as I see it, BG has stated that we are going to be backwards compatible so an E&C program will have to be able to import data not using evidence persons. So there can be no question that BG must support Conclusion records, and they can have several sources (or rather citations.

The question I am not sure if has been agreed in previous discussion(s) is if the bits and pieces of an event must be sourceable, and the relation between an event level citation and the bits and pieces in the event. (I guess the same problem applies to all levels.)
Se http://bettergedcom.wikispaces.com/message/view/BetterGEDCOM+Comparisons/32431138?o=20

A name must obviously be sourceable, also gender.

An observation that may need a separate discussion is: An importing application supporting only conclusions need to do a merging, and the rules for this must be standardized.

A detailed remark: I would not call the "the user's proof statement that justifies making the conclusion" a source, rather a research note.

gthorud 2011-03-05T14:03:02-08:00

I think a detailed example using DIAGRAMS is needed for everyone to understand the E&C model.
It should describe the entities with examples, and what happens when new sources are added, and what happens when evidence persons are merged into conclusion persons.

The reasons described by testuser could be used to argue for parts of the model - but I am not sure he/she covered all aspect, eg. conclusion persons at a sub level.

Also, it would be interesting to explore some possible scetchy solutions for a userinterface, cf Ancestry Insider and nFS. I think that will be the proof of an implementable concept.

But that is a big task......

What is the Evidence and Conclusion Process and why is it important to BetterGEDCOM

What do we mean when we say the "Evidence and Conclusion Process."

Why is BetterGEDCOM discussing the "Evidence and Conclusion Process."

Comments